Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood wrote: A couple of minor amendments here: - remove link to libpq (from cut+past of dblnk's Makefile) - add comment for pg_buffercache module in contrib/README - change my listed email to this one (I have resigned) Applied, thanks for the patch. BTW, I removed the REGRESS=... line from the Makefile, as that caused `make installcheck' of contrib/ to fail on my machine. -Neil ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] Display Pg buffer cache (WIP)
A couple of minor amendments here: - remove link to libpq (from cut+past of dblnk's Makefile) - add comment for pg_buffercache module in contrib/README - change my listed email to this one (I have resigned) regards Mark diff -Nacr pgsql.orig/contrib/Makefile pgsql/contrib/Makefile *** pgsql.orig/contrib/Makefile Thu Mar 3 11:29:53 2005 --- pgsql/contrib/Makefile Wed Mar 9 10:10:41 2005 *** *** 26,31 --- 26,32 noupdate\ oid2name\ pg_autovacuum \ + pg_buffercache \ pg_dumplo \ pg_trgm \ pgbench \ diff -Nacr pgsql.orig/contrib/README pgsql/contrib/README *** pgsql.orig/contrib/README Thu Mar 3 11:29:53 2005 --- pgsql/contrib/READMEFri Mar 11 12:11:18 2005 *** *** 136,141 --- 136,145 Automatically performs vacuum by Matthew T. O'Connor matthew@zeut.net + pg_buffercace - + Real time queries on the shared buffer cache. + by Mark Kirkwood [EMAIL PROTECTED] + pg_dumplo - Dump large objects by Karel Zak [EMAIL PROTECTED] diff -Nacr pgsql.orig/contrib/pg_buffercache/Makefile pgsql/contrib/pg_buffercache/Makefile *** pgsql.orig/contrib/pg_buffercache/Makefile Thu Jan 1 12:00:00 1970 --- pgsql/contrib/pg_buffercache/Makefile Thu Mar 10 08:19:20 2005 *** *** 0 --- 1,19 + # $PostgreSQL$ + + MODULE_big = pg_buffercache + OBJS = pg_buffercache_pages.o + + DATA_built = pg_buffercache.sql + DOCS = README.pg_buffercache + REGRESS = pg_buffercache + + + ifdef USE_PGXS + PGXS = $(shell pg_config --pgxs) + include $(PGXS) + else + subdir = contrib/pg_buffercache + top_builddir = ../.. + include $(top_builddir)/src/Makefile.global + include $(top_srcdir)/contrib/contrib-global.mk + endif diff -Nacr pgsql.orig/contrib/pg_buffercache/README.pg_buffercache pgsql/contrib/pg_buffercache/README.pg_buffercache *** pgsql.orig/contrib/pg_buffercache/README.pg_buffercache Thu Jan 1 12:00:00 1970 --- pgsql/contrib/pg_buffercache/README.pg_buffercache Fri Mar 11 12:11:03 2005 *** *** 0 --- 1,112 + Pg_buffercache - Real time queries on the shared buffer cache. + -- + + This module consists of a C function 'pg_buffercache_pages()' that returns + a set of records, plus a view 'pg_buffercache' to wrapper the function. + + The intent is to do for the buffercache what pg_locks does for locks, i.e - + ability to examine what is happening at any given time without having to + restart or rebuild the server with debugging code added. + + By default public access is REVOKED from both of these, just in case there + are security issues lurking. + + + Installation + + + Build and install the main Postgresql source, then this contrib module: + + $ cd contrib/pg_buffercache + $ gmake + $ gmake install + + + To register the functions: + + $ psql -d database -f pg_buffercache.sql + + + Notes + - + + The definition of the columns exposed in the view is: + +Column | references | Description + +--+ +bufferid | | Id, 1-shared_buffers. +relfilenode| pg_class.relfilenode | Refilenode of the relation. +reltablespace | pg_tablespace.oid| Tablespace oid of the relation. +reldatabase| pg_database.oid | Database for the relation. +relblocknumber | | Offset of the page in the relation. +isdirty| | Is the page dirty? + + + There is one row for each buffer in the shared cache. Unused buffers are + shown with all fields null except bufferid. + + Because the cache is shared by all the databases, there are pages from + relations not belonging to the current database. + + When the pg_buffercache view is accessed, internal buffer manager locks are + taken, and a copy of the buffer cache data is made for the view to display. + This ensures that the view produces a consistent set of results, while not + blocking normal buffer activity longer than necessary. Nonetheless there + could be some impact on database performance if this view is read often. + + + Sample output + - + + regression=# \d pg_buffercache; +View public.pg_buffercache +Column | Type | Modifiers + +-+--- +bufferid | integer | +relfilenode| oid | +reltablespace | oid | +reldatabase| oid | +relblocknumber | numeric | +isdirty| boolean | + View definition: +SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, + p.relblocknumber, p.isdirty + FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, +
Re: [PATCHES] Display Pg buffer cache (WIP)
Tom Lane wrote: I would rather see this as a contrib module. There has been *zero* consensus that we need this in core, nor any discussion about whether it might be a security hole. Hmmm, I guess my motivation for thinking it might be useful in core was that it was similar to 'pg_locks' and 'pg_stats', i.e. exposing some internals in a convenient queryable manner (useful for problem solving). On the security front, I should have added a 'REVOKE ALL ...FROM PUBLIC' on the view to at least control access (fortunately easy to add). cheers Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Display Pg buffer cache (WIP)
Neil Conway wrote: Mark Kirkwood wrote: + TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblockbumber, + NUMERICOID, -1, 0); I think this should be an int4, not numeric. I was looking for an UINT4OID :-), but numeric seemed the best compromise (safer than overflowing int4). I guess I could add a type 'blocknumber' that is actually the same as 'oid', but it seems liks a lot of effort for one column. cheers Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Display Pg buffer cache (WIP)
Tom Lane wrote: One reason for making it contrib is that I don't think you've got it entirely right yet, and there will be several iterations before it settles down. In a contrib module that is no problem, in core it's a forced initdb each time. Yeah - certainly less intrusive as a contrib if amendments are required! Barring a huge groundswell of support for it in core :-) I will resubmit a patch for it as a contrib module. cheers Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood wrote: Tom Lane wrote: One reason for making it contrib is that I don't think you've got it entirely right yet, and there will be several iterations before it settles down. In a contrib module that is no problem, in core it's a forced initdb each time. Yeah - certainly less intrusive as a contrib if amendments are required! Barring a huge groundswell of support for it in core :-) I will resubmit a patch for it as a contrib module. And here it is... Mark diff -Nacr pgsql.orig/contrib/Makefile pgsql/contrib/Makefile *** pgsql.orig/contrib/Makefile Thu Mar 3 11:29:53 2005 --- pgsql/contrib/Makefile Wed Mar 9 10:10:41 2005 *** *** 26,31 --- 26,32 noupdate\ oid2name\ pg_autovacuum \ + pg_buffercache \ pg_dumplo \ pg_trgm \ pgbench \ diff -Nacr pgsql.orig/contrib/pg_buffercache/Makefile pgsql/contrib/pg_buffercache/Makefile *** pgsql.orig/contrib/pg_buffercache/Makefile Thu Jan 1 12:00:00 1970 --- pgsql/contrib/pg_buffercache/Makefile Wed Mar 9 11:42:05 2005 *** *** 0 --- 1,21 + # $PostgreSQL$ + + MODULE_big = pg_buffercache + PG_CPPFLAGS = -I$(libpq_srcdir) + OBJS = pg_buffercache_pages.o + SHLIB_LINK = $(libpq) + + DATA_built = pg_buffercache.sql + DOCS = README.pg_buffercache + REGRESS = pg_buffercache + + + ifdef USE_PGXS + PGXS = $(shell pg_config --pgxs) + include $(PGXS) + else + subdir = contrib/pg_buffercache + top_builddir = ../.. + include $(top_builddir)/src/Makefile.global + include $(top_srcdir)/contrib/contrib-global.mk + endif diff -Nacr pgsql.orig/contrib/pg_buffercache/README.pg_buffercache pgsql/contrib/pg_buffercache/README.pg_buffercache *** pgsql.orig/contrib/pg_buffercache/README.pg_buffercache Thu Jan 1 12:00:00 1970 --- pgsql/contrib/pg_buffercache/README.pg_buffercache Wed Mar 9 12:26:56 2005 *** *** 0 --- 1,112 + Pg_buffercache - Real-time queries on the shared buffer cache. + -- + + This module consists of a C function 'pg_buffercache_pages()' that returns + a set of records, plus a view 'pg_buffercache' to wrapper the function. + + The intent is to do for the buffercache what pg_locks does for locks, i.e - + ability to examine what is happening at any given time without having to + restart or rebuild the server with debugging code added. + + By default public access is REVOKED from both of these, just in case there + are security issues lurking. + + + Installation + + + Build and install the main Postgresql source, then this contrib module: + + $ cd contrib/pg_buffercache + $ gmake + $ gmake install + + + To register the functions: + + $ psql -d database -f pg_buffercache.sql + + + Notes + - + + The definition of the columns exposed in the view is: + +Column | references | Description + +--+ +bufferid | | Id, 1-shared_buffers. +relfilenode| pg_class.relfilenode | Refilenode of the relation. +reltablespace | pg_tablespace.oid| Tablespace oid of the relation. +reldatabase| pg_database.oid | Database for the relation. +relblocknumber | | Offset of the page in the relation. +isdirty| | Is the page dirty? + + + There is one row for each buffer in the shared cache. Unused buffers are + shown with all fields null except bufferid. + + Because the cache is shared by all the databases, there are pages from + relations not belonging to the current database. + + When the pg_buffercache view is accessed, internal buffer manager locks are + taken, and a copy of the buffer cache data is made for the view to display. + This ensures that the view produces a consistent set of results, while not + blocking normal buffer activity longer than necessary. Nonetheless there + could be some impact on database performance if this view is read often. + + + Sample output + - + + regression=# \d pg_buffercache; +View public.pg_buffercache +Column | Type | Modifiers + +-+--- +bufferid | integer | +relfilenode| oid | +reltablespace | oid | +reldatabase| oid | +relblocknumber | numeric | +isdirty| boolean | + View definition: +SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, + p.relblocknumber, p.isdirty + FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, + reltablespace oid, reldatabase oid, relblocknumber numeric(10,0), + isdirty boolean); + + regression=# SELECT c.relname, count(*) AS buffers +FROM pg_class c, pg_buffercache b
Re: [PATCHES] Display Pg buffer cache (WIP)
The latest iteration. I have added documentation and updated the expected output so that the regression tests pass. In addition, after looking at the various system view names, I decided that 'pg_cache_dump' does not fit in nicely - so chose an more Pg suitable name of 'pg_buffercache'. Some renaming of the backend functions happened too. Finally, since I was saving blocknum, it went into the view as well. Hopefully I am dealing with invalid buffer tags sensibly now. The per-buffer spin lock is still being held - altho it is obviously trivial to remove if not actually required. regards Mark P.s : remembered to use diff -c Mark Kirkwood wrote: Neil Conway wrote: Tom Lane wrote: It'd be possible to dispense with the per-buffer spinlocks so long as you look only at the tag (and perhaps the TAG_VALID flag bit). The tags can't be changing while you hold the BufMappingLock. That's what I had thought at first, but this comment in buf_internals.h dissuaded me: buf_hdr_lock must be held to examine or change the tag, flags, usage_count, refcount, or wait_backend_id fields. The comment already notes this isn't true if you've got the buffer pinned; it would be worth adding another exception for holding the BufMappingLock, IMHO. I'm dubious that there's any point in recording information as transient as the refcounts and dirtybits I think it's worth recording dirty bits -- it provides an indication of the effectiveness of the bgwriter, for example. Reference counts could be done away with, although I doubt it would have a significant effect on the time spent holding the lock. Let's suppose refcount is eliminated. I will then be examining the tag, flags and buf_id elements of the buffer. Holding the BufMappingLock prevents the tag changing, but what about the flags? In addition Tom pointed out that I am not examining the BM_TAG_VALID or BM_VALID flag bits (I am only checking if tag.blockNum equals InvalidBlockNumber). My initial thought is to handle !BM_TAG_VALID or !BM_VALID similarly to InvalidBlockNumber i.e all non buf_id fields set to NULL. Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) diff -Nacr pgsql.orig/doc/src/sgml/catalogs.sgml pgsql/doc/src/sgml/catalogs.sgml *** pgsql.orig/doc/src/sgml/catalogs.sgml Mon Mar 7 12:20:17 2005 --- pgsql/doc/src/sgml/catalogs.sgmlTue Mar 8 12:03:50 2005 *** *** 3875,3880 --- 3875,3885 tbody row + entrylink linkend=view-pg-buffercachestructnamepg_buffercache/structname/link/entry + entryshared buffer cache/entry + /row + + row entrylink linkend=view-pg-indexesstructnamepg_indexes/structname/link/entry entryindexes/entry /row *** *** 3917,3922 --- 3922,4021 /tbody /tgroup /table + /sect1 + + sect1 id=view-pg-buffercache + titlestructnamepg_buffercache/structname/title + + indexterm zone=view-pg-buffercache +primarypg_buffercache/primary + /indexterm + para +The view structnamepg_buffercache/structname provides access to +some information from the shared buffer cache. + /para + + para +There is one row for each buffer in the shared cache. Unused buffers are +shown with all fields null except structfieldbufferid/structfield. +Because the cache is shared by all the databases, there are pages from +relations not belonging to the current database. + /para + + + table +titlestructnamepg_buffercache/structname Columns/title + +tgroup cols=4 + thead + row + entryName/entry + entryType/entry + entryReferences/entry + entryDescription/entry + /row + /thead + tbody + row + entrybufferid/entry + entrytypeinteger/type/entry + entry/entry + entry +The buffer number. This is numbered 1 to varnameshared_buffers/varname. + /entry + /row + row + entryrelfilenode/entry + entrytypeoid/type/entry + entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.relfilenode/literal/entry + entry + The on-disk file for the relation that this page came from. + /entry + /row + row + entryreltablespace/entry + entrytypeoid/type/entry + entry + literallink linkend=catalog-pg-tablespacestructnamepg_tablespace/structname/link.oid/literal + /entry + entryTablespace the corresponding relation is in./entry + /row + row + entryreldatabase/entry + entrytypeoid/type/entry + entryliterallink linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry + entry +Database the corresponding relation belongs to, or zero if the +relation is a globally-shared table/entry + /row
Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood wrote: + tupledesc = CreateTemplateTupleDesc(NUM_BUFFERCACHE_PAGES_ELEM, false); + TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid, + INT4OID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblockbumber, + NUMERICOID, -1, 0); I think this should be an int4, not numeric. Otherwise, looks good to me. Barring any objections, I'll apply this with a few stylistic tweaks and the numeric - int4 change tomorrow. -Neil ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Display Pg buffer cache (WIP)
Neil Conway [EMAIL PROTECTED] writes: Mark Kirkwood wrote: +TupleDescInitEntry(tupledesc, (AttrNumber) 5, relblockbumber, + NUMERICOID, -1, 0); I think this should be an int4, not numeric. needs spell check too ;-) More generally, BlockNumber is unsigned and so int4 is not at all an accurate conversion. Perhaps OID would be a good choice even though it's horribly wrong on one level. Otherwise, looks good to me. Barring any objections, I'll apply this with a few stylistic tweaks and the numeric - int4 change tomorrow. I would rather see this as a contrib module. There has been *zero* consensus that we need this in core, nor any discussion about whether it might be a security hole. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Display Pg buffer cache (WIP)
Tom Lane wrote: Perhaps OID would be a good choice even though it's horribly wrong on one level. Either that or add unsigned numeric types to PG :-) I would rather see this as a contrib module. I don't really have an opinion either way. Does anyone else have thoughts on this? There has been *zero* consensus that we need this in core, nor any discussion about whether it might be a security hole. ISTM if it can only be invoked by the superuser, it should be safe enough. -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Display Pg buffer cache (WIP)
Tom Lane wrote: It'd be possible to dispense with the per-buffer spinlocks so long as you look only at the tag (and perhaps the TAG_VALID flag bit). The tags can't be changing while you hold the BufMappingLock. That's what I had thought at first, but this comment in buf_internals.h dissuaded me: buf_hdr_lock must be held to examine or change the tag, flags, usage_count, refcount, or wait_backend_id fields. The comment already notes this isn't true if you've got the buffer pinned; it would be worth adding another exception for holding the BufMappingLock, IMHO. I'm dubious that there's any point in recording information as transient as the refcounts and dirtybits I think it's worth recording dirty bits -- it provides an indication of the effectiveness of the bgwriter, for example. Reference counts could be done away with, although I doubt it would have a significant effect on the time spent holding the lock. -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Display Pg buffer cache (WIP)
Only two things to add: you forgot to add `cachedump.o' to the list of OBJS in the utils/adt Makefile. Mark Kirkwood wrote: +typedef struct +{ + uint32 bufferid; + Oid relfilenode; + Oid reltablespace; + Oid reldatabase; + boolisdirty; + uint32 refcount; + BlockNumber blocknum; + +} CacheDumpRec; You should probably make `isdirty' the last member of this struct, so as to reduce alignment/padding requirements (this won't actually save any space right now, but it might save some space if more members are added to the struct in the future). -Neil ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Display Pg buffer cache (WIP)
Neil Conway wrote: Tom Lane wrote: It'd be possible to dispense with the per-buffer spinlocks so long as you look only at the tag (and perhaps the TAG_VALID flag bit). The tags can't be changing while you hold the BufMappingLock. That's what I had thought at first, but this comment in buf_internals.h dissuaded me: buf_hdr_lock must be held to examine or change the tag, flags, usage_count, refcount, or wait_backend_id fields. The comment already notes this isn't true if you've got the buffer pinned; it would be worth adding another exception for holding the BufMappingLock, IMHO. I'm dubious that there's any point in recording information as transient as the refcounts and dirtybits I think it's worth recording dirty bits -- it provides an indication of the effectiveness of the bgwriter, for example. Reference counts could be done away with, although I doubt it would have a significant effect on the time spent holding the lock. Let's suppose refcount is eliminated. I will then be examining the tag, flags and buf_id elements of the buffer. Holding the BufMappingLock prevents the tag changing, but what about the flags? In addition Tom pointed out that I am not examining the BM_TAG_VALID or BM_VALID flag bits (I am only checking if tag.blockNum equals InvalidBlockNumber). My initial thought is to handle !BM_TAG_VALID or !BM_VALID similarly to InvalidBlockNumber i.e all non buf_id fields set to NULL. Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] Display Pg buffer cache (WIP)
Neil Conway wrote: If you do decide to hold the BufMappingLock, it might make sense to: 1. allocate an array of NBuffers elements 2. acquire BufferMappingLock in share mode 3. sequentially scan through the buffer pool, copying data into the array 4. release the lock 5. on each subsequent call to the SRF, format and return an element of the array Which should reduce the time to lock is held. This will require allocating NBuffers * size_of_stats memory (where size_of_stats will be something like 16 bytes). That is a better approach, so I've used it in this new iteration. In addition to holding the BufMappingLock, each buffer header is (spin) locked before examining it, hopefully this is correct - BTW, I like the new buffer lock design. I'm still using BuildTupleFromCStrings, so there is considerable use of sprintf conversion and temporary char * stuff. I would like this to be a bit cleaner, so any suggestions welcome. regards Mark diff -Naur pgsql.orig/src/backend/catalog/system_views.sql pgsql/src/backend/catalog/system_views.sql --- pgsql.orig/src/backend/catalog/system_views.sql Fri Mar 4 14:23:09 2005 +++ pgsql/src/backend/catalog/system_views.sql Fri Mar 4 14:21:33 2005 @@ -277,3 +277,9 @@ DO INSTEAD NOTHING; GRANT SELECT, UPDATE ON pg_settings TO PUBLIC; + +CREATE VIEW pg_cache_dump AS + SELECT D.* FROM pg_cache_dump() AS D + (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, +isdirty bool, refcount int4); + diff -Naur pgsql.orig/src/backend/utils/adt/cachedump.c pgsql/src/backend/utils/adt/cachedump.c --- pgsql.orig/src/backend/utils/adt/cachedump.cThu Jan 1 12:00:00 1970 +++ pgsql/src/backend/utils/adt/cachedump.c Sat Mar 5 20:21:45 2005 @@ -0,0 +1,221 @@ +/*- + * + * cachedump.c + *display some contents of the buffer cache + * + * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * $PostgreSQL$ + *- + */ +#include postgres.h +#include funcapi.h +#include catalog/pg_type.h +#include storage/buf_internals.h +#include storage/bufmgr.h +#include utils/relcache.h +#include utils/builtins.h + + +#define NUM_CACHE_DUMP_ELEM6 + +/* + * Record structure holding the to be exposed cache data. + */ +typedef struct +{ + uint32 bufferid; + Oid relfilenode; + Oid reltablespace; + Oid reldatabase; + boolisdirty; + uint32 refcount; + BlockNumber blocknum; + +} CacheDumpRec; + + +/* + * Function context for data persisting over repeated calls. + */ +typedef struct +{ + AttInMetadata *attinmeta; + CacheDumpRec*record; + char*values[NUM_CACHE_DUMP_ELEM]; +} CacheDumpContext; + + +/* + * Function returning data from the shared buffer cache - buffer number, + * relation node/tablespace/database, dirty indicator and refcount. + */ +Datum +cache_dump(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + Datum result; + MemoryContext oldcontext; + CacheDumpContext*fctx; /* User function context. */ + TupleDesc tupledesc; + HeapTuple tuple; + + if (SRF_IS_FIRSTCALL()) + { + RelFileNode rnode; + uint32 i; + BufferDesc *bufHdr; + + + funcctx = SRF_FIRSTCALL_INIT(); + + /* Switch context when allocating stuff to be used in later calls */ + oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx); + + /* construct a tuple to return */ + tupledesc = CreateTemplateTupleDesc(NUM_CACHE_DUMP_ELEM, false); + TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid, + INT4OID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 5, isdirty, + BOOLOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 6, refcount, +
Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood [EMAIL PROTECTED] writes: In addition to holding the BufMappingLock, each buffer header is (spin) locked before examining it, hopefully this is correct - BTW, I like the new buffer lock design. It'd be possible to dispense with the per-buffer spinlocks so long as you look only at the tag (and perhaps the TAG_VALID flag bit). The tags can't be changing while you hold the BufMappingLock. I'm dubious that there's any point in recording information as transient as the refcounts and dirtybits, and reducing the amount of time you hold the lock would be a Good Thing. Also, you're still not dealing with the case of a not-valid buffer in a reasonable way. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Display Pg buffer cache (WIP)
Tom Lane wrote: Mark Kirkwood [EMAIL PROTECTED] writes: I am using anoncvs from yesterday, so if Tom's new scheme is *very* new I may be missing it. It's not committed yet ;-) Yep - that's pretty new, apologies for slow grey matter... I have been following the discussion about the new buffer design, but didn't think to connect it to this context! cheers Mark ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] Display Pg buffer cache (WIP)
Neil Conway wrote: I don't like accessing shared data structures without acquiring the appropriate locks; even if it doesn't break anything, it seems like just asking for trouble. In order to be able to inspect a buffer's tag in Tom's new locking scheme (not yet in HEAD, but will be in 8.1), you need only hold a per-buffer lock. That will mean acquiring and releasing a spinlock for each buffer, which isn't _too_ bad... I am looking at this bit now. I take the first part to mean that don't need to wrap LWLockAcquire(BufMgrLock, LW_EXCLUSIVE) ... LWLockRelease(BufMgrLock) around the inspection of the buffers (?) This per buffer lock, are we talking about LockBuffer(buf, BUFFER_LOCK_SHARE) ... releaseBuffer(buf) ? If so, it looks like I need to do some stuff with ResourceOwners, otherwise ReleaseBuffer will fail (or am I wandering up the wrong track here?). I am using anoncvs from yesterday, so if Tom's new scheme is *very* new I may be missing it. Thanks Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood wrote: If so, it looks like I need to do some stuff with ResourceOwners, otherwise ReleaseBuffer will fail (or am I wandering up the wrong track here?). I am using anoncvs from yesterday, so if Tom's new scheme is *very* new I may be missing it. It's so new, in fact, it's not in CVS yet :) I believe the latest revision of the patch is here: http://archives.postgresql.org/pgsql-patches/2005-02/msg00115.php The locking scheme is described here: http://archives.postgresql.org/pgsql-hackers/2005-02/msg00391.php Holding the per-buffer header spinlock should be enough to ensure the tag doesn't change. To get a globally consistent snapshot of the state of the bufmgr, I believe it should be sufficient to also share-lock the BufMappingLock for the duration of the scan. The latter will prevent new pages from being loaded in the buffer pool, so I'm not sure if it's worth doing. If you do decide to hold the BufMappingLock, it might make sense to: 1. allocate an array of NBuffers elements 2. acquire BufferMappingLock in share mode 3. sequentially scan through the buffer pool, copying data into the array 4. release the lock 5. on each subsequent call to the SRF, format and return an element of the array Which should reduce the time to lock is held. This will require allocating NBuffers * size_of_stats memory (where size_of_stats will be something like 16 bytes). -Neil ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood [EMAIL PROTECTED] writes: I am using anoncvs from yesterday, so if Tom's new scheme is *very* new I may be missing it. It's not committed yet ;-) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PATCHES] Display Pg buffer cache (WIP)
I found it useful to be able to look at what relations are occupying the cache (mainly for debugging and comparing with page estimates). I am wondering if it might be a useful addition generally? How it works a SRF called dump_cache() is created, which is then exposed as system view pg_dump_cache. So stuff like the following can be performed (cache of 2000 immediately after 'make installcheck'): regression=# SELECT c.relname, count(*) AS buffers regression-# FROM pg_class c, pg_dump_cache d regression-# WHERE d.relfilenode = c.relfilenode regression-# GROUP BY c.relname regression-# ORDER BY 2 DESC LIMIT 10; relname | buffers -+- tenk1 | 345 tenk2 | 345 onek| 138 pg_attribute| 102 road| 81 pg_attribute_relid_attnam_index | 74 onek2 | 69 pg_proc | 59 pg_proc_proname_args_nsp_index | 56 hash_f8_heap| 49 (10 rows) regression=# As of now the patch breaks 1 regression test (rules - that new view...) and does not peek into the contents of the buffers themselves (I was mainly interested in which relations were there), and does not worry too much about a consistent view of the buffer contents (as I didn't want it to block all other activity!). If it seems like a worthwhile addition I will amend the regression expected and resubmit. Any comments? Mark diff -Naur pgsql.orig/src/backend/catalog/system_views.sql pgsql/src/backend/catalog/system_views.sql --- pgsql.orig/src/backend/catalog/system_views.sql Thu Mar 3 11:29:55 2005 +++ pgsql/src/backend/catalog/system_views.sql Thu Mar 3 11:41:24 2005 @@ -277,3 +277,8 @@ DO INSTEAD NOTHING; GRANT SELECT, UPDATE ON pg_settings TO PUBLIC; + +CREATE VIEW pg_dump_cache AS + SELECT D.* FROM pg_dump_cache() AS D + (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid); + diff -Naur pgsql.orig/src/backend/utils/adt/Makefile pgsql/src/backend/utils/adt/Makefile --- pgsql.orig/src/backend/utils/adt/Makefile Thu Mar 3 11:29:53 2005 +++ pgsql/src/backend/utils/adt/MakefileThu Mar 3 11:32:10 2005 @@ -24,7 +24,7 @@ tid.o timestamp.o varbit.o varchar.o varlena.o version.o xid.o \ network.o mac.o inet_net_ntop.o inet_net_pton.o \ ri_triggers.o pg_lzcompress.o pg_locale.o formatting.o \ - ascii.o quote.o pgstatfuncs.o encode.o + ascii.o quote.o pgstatfuncs.o encode.o dumpcache.o like.o: like.c like_match.c diff -Naur pgsql.orig/src/backend/utils/adt/dumpcache.c pgsql/src/backend/utils/adt/dumpcache.c --- pgsql.orig/src/backend/utils/adt/dumpcache.cThu Jan 1 12:00:00 1970 +++ pgsql/src/backend/utils/adt/dumpcache.c Thu Mar 3 11:51:53 2005 @@ -0,0 +1,119 @@ +/*- + * + * dumpcache.c + *display some contents for the buffer cache + * + *- + */ +#include postgres.h +#include funcapi.h +#include catalog/pg_type.h +#include storage/buf_internals.h +#include storage/bufmgr.h +#include utils/relcache.h + + +extern Datum dump_cache(PG_FUNCTION_ARGS); + + +/* + * Function context for data persisting over repeated calls. + */ +typedef struct { + int buffer; + AttInMetadata *attinmeta; + BufferDesc *bufhdr; + RelFileNode rnode; + char*values[3]; +} dumpcache_fctx; + + +/* + * Return a tuple that has bufferid, relfilenoide, reltablespace and + * reldatabase OIDs. + */ +Datum +dump_cache(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + Datum result; + MemoryContext oldcontext; + dumpcache_fctx *fctx; /* User function context. */ + TupleDesc tupledesc; + HeapTuple tuple; + + if (SRF_IS_FIRSTCALL()) + { + funcctx = SRF_FIRSTCALL_INIT(); + + /* Switch context when allocating stuff to be used in later calls */ + oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx); + + /* construct a tuple to return */ + tupledesc = CreateTemplateTupleDesc(4, false); + TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid, + INT4OID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode, + OIDOID, -1, 0); + TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace, +
Re: [PATCHES] Display Pg buffer cache (WIP)
Mark Kirkwood wrote: does not worry too much about a consistent view of the buffer contents (as I didn't want it to block all other activity!). I don't like accessing shared data structures without acquiring the appropriate locks; even if it doesn't break anything, it seems like just asking for trouble. In order to be able to inspect a buffer's tag in Tom's new locking scheme (not yet in HEAD, but will be in 8.1), you need only hold a per-buffer lock. That will mean acquiring and releasing a spinlock for each buffer, which isn't _too_ bad... That means the data reported by the function might still be inconsistent; not sure how big a problem that is. It might be nice for the patch to indicate whether the buffers are dirty, and what their shared reference count is. +extern Datum dump_cache(PG_FUNCTION_ARGS); This declaration belongs in a header file (such as include/utils/builtins.h). +typedef struct { + int buffer; + AttInMetadata *attinmeta; + BufferDesc *bufhdr; + RelFileNode rnode; + char*values[3]; +} dumpcache_fctx; This should be values[4], no? This is trivial, but I think most type names use camel case (NamesLikeThis). Why does `rnode' need to be in the struct? You can also fetch the buffer number from the buffer desc, so you needn't store another copy of it. + /* allocate the strings for tuple formation */ + fctx-values[0] = (char *) palloc(NAMEDATALEN + 1); + fctx-values[1] = (char *) palloc(NAMEDATALEN + 1); + fctx-values[2] = (char *) palloc(NAMEDATALEN + 1); + fctx-values[3] = (char *) palloc(NAMEDATALEN + 1); Is there a reason for choosing NAMEDATALEN + 1 as the size of these buffers? (There is no relation between NAMEDATALEN and the number of characters an OID will consume when converted via sprintf(%u) ) The patch doesn't treat unassigned buffers specially (i.e. those buffers whose tag contains of InvalidOids). Perhaps it should return NULL for their OID fields, rather than InvalidOid (which will be 0)? (An alternative would be to not include those rows in the result set, although perhaps administrators might want this information.) -Neil ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster