Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-05 Thread Mark Kirkwood
Neil Conway wrote:
If you do decide to hold the BufMappingLock, it might make
sense to:
1. allocate an array of NBuffers elements
2. acquire BufferMappingLock in share mode
3. sequentially scan through the buffer pool, copying data into the array
4. release the lock
5. on each subsequent call to the SRF, format and return an element of 
the array

Which should reduce the time to lock is held. This will require 
allocating NBuffers * size_of_stats memory (where size_of_stats will be 
something like 16 bytes).

That is a better approach, so I've used it in this new iteration.
In addition to holding the BufMappingLock, each buffer header is (spin)
locked before examining it, hopefully this is correct - BTW, I like the
new buffer lock design.
I'm still using BuildTupleFromCStrings, so there is considerable use of
sprintf conversion and temporary char * stuff. I would like this to be
a bit cleaner, so any suggestions welcome.
regards
Mark

diff -Naur pgsql.orig/src/backend/catalog/system_views.sql 
pgsql/src/backend/catalog/system_views.sql
--- pgsql.orig/src/backend/catalog/system_views.sql Fri Mar  4 14:23:09 2005
+++ pgsql/src/backend/catalog/system_views.sql  Fri Mar  4 14:21:33 2005
@@ -277,3 +277,9 @@
 DO INSTEAD NOTHING;
 
 GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
+
+CREATE VIEW pg_cache_dump AS
+   SELECT D.* FROM pg_cache_dump() AS D
+   (bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, 
+isdirty bool, refcount int4);
+
diff -Naur pgsql.orig/src/backend/utils/adt/cachedump.c 
pgsql/src/backend/utils/adt/cachedump.c
--- pgsql.orig/src/backend/utils/adt/cachedump.cThu Jan  1 12:00:00 1970
+++ pgsql/src/backend/utils/adt/cachedump.c Sat Mar  5 20:21:45 2005
@@ -0,0 +1,221 @@
+/*-
+ *
+ * cachedump.c
+ *display some contents of the buffer cache
+ *
+ * Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *   $PostgreSQL$
+ *-
+ */
+#include postgres.h
+#include funcapi.h
+#include catalog/pg_type.h
+#include storage/buf_internals.h
+#include storage/bufmgr.h
+#include utils/relcache.h
+#include utils/builtins.h
+
+
+#define NUM_CACHE_DUMP_ELEM6
+
+/*
+ * Record structure holding the to be exposed cache data.
+ */
+typedef struct
+{
+   uint32  bufferid;
+   Oid relfilenode;
+   Oid reltablespace;
+   Oid reldatabase;
+   boolisdirty;
+   uint32  refcount;
+   BlockNumber blocknum;
+
+} CacheDumpRec;
+
+
+/*
+ * Function context for data persisting over repeated calls.
+ */
+typedef struct 
+{
+   AttInMetadata   *attinmeta;
+   CacheDumpRec*record;
+   char*values[NUM_CACHE_DUMP_ELEM];
+} CacheDumpContext;
+
+
+/*
+ * Function returning data from the shared buffer cache - buffer number,
+ * relation node/tablespace/database, dirty indicator and refcount.
+ */
+Datum
+cache_dump(PG_FUNCTION_ARGS)
+{
+   FuncCallContext *funcctx;
+   Datum   result;
+   MemoryContext   oldcontext;
+   CacheDumpContext*fctx;  /* User function context. */
+   TupleDesc   tupledesc;
+   HeapTuple   tuple;
+
+   if (SRF_IS_FIRSTCALL())
+   {
+   RelFileNode rnode;
+   uint32  i;
+   BufferDesc  *bufHdr;
+
+
+   funcctx = SRF_FIRSTCALL_INIT();
+
+   /* Switch context when allocating stuff to be used in later 
calls */
+   oldcontext = 
MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
+   
+   /* construct a tuple to return */
+   tupledesc = CreateTemplateTupleDesc(NUM_CACHE_DUMP_ELEM, false);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 1, bufferid,
+   
INT4OID, -1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 2, relfilenode,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 3, reltablespace,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 4, reldatabase,
+   OIDOID, 
-1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 5, isdirty,
+   
BOOLOID, -1, 0);
+   TupleDescInitEntry(tupledesc, (AttrNumber) 6, refcount,
+  

[PATCHES] fork() refactoring

2005-03-05 Thread Neil Conway
This patch moves all the common code that is usually invoked before 
doing a fork() into a single function, fork_process(). It is not aware 
of the EXEC_BACKEND machinery, so it should be used as fork() currently 
is -- inside an #ifndef EXEC_BACKEND block, if appropriate.

I wasn't sure whether to put this under backend/port, postmaster.c, or 
in a separate .c file. This patch places the code in a new file 
postmaster/fork_process.c -- suggestions on whether this is the right 
location would be welcome.

Barring any objections, I'll apply this to HEAD on Monday.
-Neil
Index: src/backend/port/beos/support.c
===
RCS file: /Users/neilc/local/cvs/pgsql/src/backend/port/beos/support.c,v
retrieving revision 1.11
diff -c -r1.11 support.c
*** src/backend/port/beos/support.c	25 Oct 2004 03:23:02 -	1.11
--- src/backend/port/beos/support.c	5 Mar 2005 06:23:42 -
***
*** 265,271 
  
  
  
! /* The behavior of fork is borken on beos regarding shared memory. In fact
  all shared memory areas are clones in copy on write mode in the new process.
  
  We need to do a remapping of these areas. Just afer the fork we performe the
--- 265,271 
  
  
  
! /* The behavior of fork is broken on beos regarding shared memory. In fact
  all shared memory areas are clones in copy on write mode in the new process.
  
  We need to do a remapping of these areas. Just afer the fork we performe the
Index: src/backend/postmaster/Makefile
===
RCS file: /Users/neilc/local/cvs/pgsql/src/backend/postmaster/Makefile,v
retrieving revision 1.19
diff -c -r1.19 Makefile
*** src/backend/postmaster/Makefile	5 Aug 2004 23:32:10 -	1.19
--- src/backend/postmaster/Makefile	5 Mar 2005 06:37:28 -
***
*** 12,18 
  top_builddir = ../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = postmaster.o bgwriter.o pgstat.o pgarch.o syslogger.o
  
  all: SUBSYS.o
  
--- 12,18 
  top_builddir = ../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = bgwriter.o fork_process.o pgarch.o pgstat.o postmaster.o syslogger.o
  
  all: SUBSYS.o
  
Index: src/backend/postmaster/fork_process.c
===
RCS file: src/backend/postmaster/fork_process.c
diff -N src/backend/postmaster/fork_process.c
*** /dev/null	1 Jan 1970 00:00:00 -
--- src/backend/postmaster/fork_process.c	5 Mar 2005 09:02:35 -
***
*** 0 
--- 1,80 
+ /*
+  * fork_process.c
+  *   A simple wrapper on top of fork(). This does not handle the
+  *   EXEC_BACKEND case; it might be extended to do so, but it would be
+  *   considerably more complex.
+  *
+  * Copyright (c) 1996-2005, PostgreSQL Global Development Group
+  *
+  * IDENTIFICATION
+  *	  $PostgreSQL$
+  */
+ #include postgres.h
+ #include postmaster/fork_process.h
+ 
+ #include unistd.h
+ 
+ /*
+  * Wrapper for fork(). Return values are the same as those for fork():
+  * -1 if the fork failed, 0 in the child process, and the PID of the
+  * child in the parent process.
+  */
+ pid_t
+ fork_process(void)
+ {
+ 	pid_t result;
+ #ifdef LINUX_PROFILE
+ 	struct itimerval prof_itimer;
+ #endif
+ 
+ 	/*
+ 	 * Flush stdio channels just before fork, to avoid double-output
+ 	 * problems. Ideally we'd use fflush(NULL) here, but there are still a
+ 	 * few non-ANSI stdio libraries out there (like SunOS 4.1.x) that
+ 	 * coredump if we do. Presently stdout and stderr are the only stdio
+ 	 * output channels used by the postmaster, so fflush'ing them should
+ 	 * be sufficient.
+ 	 */
+ 	fflush(stdout);
+ 	fflush(stderr);
+ 
+ #ifdef LINUX_PROFILE
+ 	/*
+ 	 * Linux's fork() resets the profiling timer in the child process. If
+ 	 * we want to profile child processes then we need to save and restore
+ 	 * the timer setting.  This is a waste of time if not profiling,
+ 	 * however, so only do it if commanded by specific -DLINUX_PROFILE
+ 	 * switch.
+ 	 */
+ 	getitimer(ITIMER_PROF, prof_itimer);
+ #endif
+ 
+ #ifdef __BEOS__
+ 	/* Specific beos actions before backend startup */
+ 	beos_before_backend_startup();
+ #endif
+ 
+ 	result = fork();
+ 	if (result == (pid_t) -1)
+ 	{
+ 		/* fork failed */
+ #ifdef __BEOS__
+ 		/* Specific beos backend startup actions */
+ 		beos_backend_startup_failed();
+ #endif
+ 	}
+ 	else if (result == 0)
+ 	{
+ 		/* fork succeeded, in child */
+ #ifdef LINUX_PROFILE
+ 		setitimer(ITIMER_PROF, prof_itimer, NULL);
+ #endif
+ 
+ #ifdef __BEOS__
+ 		/* Specific beos backend startup actions */
+ 		beos_backend_startup();
+ #endif
+ 	}
+ 
+ 	return result;
+ }
Index: src/backend/postmaster/pgarch.c
===
RCS file: /Users/neilc/local/cvs/pgsql/src/backend/postmaster/pgarch.c,v
retrieving revision 1.14
diff -c -r1.14 pgarch.c
*** src/backend/postmaster/pgarch.c	31 Dec 2004 22:00:40 -	1.14
--- 

Re: [PATCHES] fork() refactoring

2005-03-05 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 This patch moves all the common code that is usually invoked before 
 doing a fork() into a single function, fork_process(). It is not aware 
 of the EXEC_BACKEND machinery, so it should be used as fork() currently 
 is -- inside an #ifndef EXEC_BACKEND block, if appropriate.

I'm worried about whether this doesn't break the EXEC_BACKEND case.
Most of the code you've moved out isn't applicable to Windows, but
the fflushes probably are --- and they are certainly applicable when
testing EXEC_BACKEND mode on a Unix machine, which is a case you may
*not* break because it will render Windows completely unsupportable.

 Barring any objections, I'll apply this to HEAD on Monday.

Please do not apply without some further portability testing.

I think it would be better to continue with your original thought of
passing a token into this code so that the EXEC_BACKEND case could be
handled too (the token would tell it which forkexec function to call).
That would probably mean that the function has to stay within
postmaster.c, but as long as it consolidates the N cases of fork
decoration into one, we're still ahead of the game.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] Display Pg buffer cache (WIP)

2005-03-05 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 In addition to holding the BufMappingLock, each buffer header is (spin)
 locked before examining it, hopefully this is correct - BTW, I like the
 new buffer lock design.

It'd be possible to dispense with the per-buffer spinlocks so long as
you look only at the tag (and perhaps the TAG_VALID flag bit).  The
tags can't be changing while you hold the BufMappingLock.  I'm dubious
that there's any point in recording information as transient as the
refcounts and dirtybits, and reducing the amount of time you hold the
lock would be a Good Thing.

Also, you're still not dealing with the case of a not-valid buffer in a
reasonable way.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PATCHES] Harmless space allocation typo

2005-03-05 Thread Heikki Linnakangas
Here's a tiny fix for a harmless typo in catalog.c:
Too much space is allocated for tablespace file path, I guess the 
directory name used to be pg_tablespaces instead of pg_tblspc at some
point.

- HeikkiIndex: catalog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/catalog/catalog.c,v
retrieving revision 1.57
diff -c -r1.57 catalog.c
*** catalog.c   31 Dec 2004 21:59:38 -  1.57
--- catalog.c   5 Mar 2005 22:12:21 -
***
*** 58,64 
else
{
/* All other tablespaces are accessed via symlinks */
!   pathlen = strlen(DataDir) + 16 + OIDCHARS + 1 + OIDCHARS + 1 + 
OIDCHARS + 1;
path = (char *) palloc(pathlen);
snprintf(path, pathlen, %s/pg_tblspc/%u/%u/%u,
 DataDir, rnode.spcNode, rnode.dbNode, 
rnode.relNode);
--- 58,64 
else
{
/* All other tablespaces are accessed via symlinks */
!   pathlen = strlen(DataDir) + 11 + OIDCHARS + 1 + OIDCHARS + 1 + 
OIDCHARS + 1;
path = (char *) palloc(pathlen);
snprintf(path, pathlen, %s/pg_tblspc/%u/%u/%u,
 DataDir, rnode.spcNode, rnode.dbNode, 
rnode.relNode);
***
*** 99,105 
else
{
/* All other tablespaces are accessed via symlinks */
!   pathlen = strlen(DataDir) + 16 + OIDCHARS + 1 + OIDCHARS + 1;
path = (char *) palloc(pathlen);
snprintf(path, pathlen, %s/pg_tblspc/%u/%u,
 DataDir, spcNode, dbNode);
--- 99,105 
else
{
/* All other tablespaces are accessed via symlinks */
!   pathlen = strlen(DataDir) + 11 + OIDCHARS + 1 + OIDCHARS + 1;
path = (char *) palloc(pathlen);
snprintf(path, pathlen, %s/pg_tblspc/%u/%u,
 DataDir, spcNode, dbNode);

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PATCHES] Cleaning up unreferenced table files

2005-03-05 Thread Heikki Linnakangas
Here's a patch for the TODO item Remove unreferenced table files created by transactions 
that were in-progress when the server terminated abruptly.

It adds a new function, CleanupStaleRelFiles, that scans through the data 
directory and removes all table files that are not mentioned in pg_class 
of the corresponding database. CleanupStaleRelFiles is called after WAL 
recovery.

Actually, the patch doesn't currently delete the files, just issues a 
warning. Testing is easier if the files don't keep getting deleted :).

The patch also adds a GetTablespacePath function similar to 
GetDatabasePath that constructs the path to a tablespace symbolic link. 
commands/tablespace.c is modified to use it, in addition to the new 
CleanupStaleRelFiles function.

- HeikkiIndex: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.181
diff -c -r1.181 xlog.c
*** src/backend/access/transam/xlog.c   12 Feb 2005 23:53:37 -  1.181
--- src/backend/access/transam/xlog.c   5 Mar 2005 22:33:56 -
***
*** 43,48 
--- 43,55 
  #include utils/guc.h
  #include utils/relcache.h
  
+ #include catalog/pg_tablespace.h
+ #include catalog/catalog.h
+ #include access/skey.h
+ #include utils/fmgroids.h
+ #include access/relscan.h
+ #include access/heapam.h
+ #include utils/resowner.h
  
  /*
   * This chunk of hackery attempts to determine which file sync methods
***
*** 465,470 
--- 472,480 
  static bool read_backup_label(XLogRecPtr *checkPointLoc);
  static void remove_backup_label(void);
  
+ static void CleanupStaleRelFilesFrom(Oid tablespaceoid, Oid dboid);
+ static void CleanupStaleRelFilesFromTablespace(Oid tablespaceoid);
+ static void CleanupStaleRelFiles(void);
  
  /*
   * Insert an XLOG record having the specified RMID and info bytes,
***
*** 4483,4488 
--- 4493,4500 
  
CreateCheckPoint(true, true);
  
+   CleanupStaleRelFiles();
+ 
/*
 * Close down recovery environment
 */
***
*** 5656,5658 
--- 5668,5852 
 errmsg(could not remove file \%s\: 
%m,
labelfilepath)));
  }
+ 
+ /* Like AllocateDir, but ereports on failure */
+ static DIR *
+ AllocateDirChecked(char *path)
+ {
+   DIR *dirdesc = AllocateDir(path);
+   if (dirdesc == NULL)
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg(could not open directory \%s\: %m, 
+   path)));
+ 
+   return dirdesc;
+ }
+ 
+ /*
+  * Scan through all tablespaces for relations left over
+  * by aborted transactions. 
+  *
+  * For example, if a transaction issues
+  * BEGIN; CREATE TABLE foobar ();
+  * and then the backend crashes, the file is left in the
+  * tablespace until CleanupStaleRelFiles deletes it.
+  */
+ static void
+ CleanupStaleRelFiles(void)
+ {
+   DIR*dirdesc;
+   struct dirent *de;
+   char   *path;
+   int pathlen;
+ 
+   pathlen = strlen(DataDir) + 11 + 1;
+   path = (char *) palloc(pathlen);
+   snprintf(path, pathlen, %s/pg_tblspc/, DataDir);
+ 
+   dirdesc = AllocateDirChecked(path);
+   
+   while ((de = readdir(dirdesc)) != NULL)
+   {
+   char *invalid;
+   Oid tablespaceoid;
+ 
+   tablespaceoid = (Oid) strtol(de-d_name, invalid, 10);
+   if(invalid[0] == '\0')
+   CleanupStaleRelFilesFromTablespace(tablespaceoid);
+   }
+   pfree(path);
+ 
+   CleanupStaleRelFilesFromTablespace(DEFAULTTABLESPACE_OID);
+ }
+ 
+ /* Scan a specific tablespace for stale relations */
+ static void
+ CleanupStaleRelFilesFromTablespace(Oid tablespaceoid)
+ {
+   DIR*dirdesc;
+   struct dirent *de;
+   char   *path;
+ 
+   path = GetTablespacePath(tablespaceoid);
+ 
+   dirdesc = AllocateDirChecked(path);
+   while ((de = readdir(dirdesc)) != NULL)
+   {
+   char *invalid;
+   Oid dboid;
+ 
+   dboid = (Oid) strtol(de-d_name, invalid, 10);
+   if(invalid[0] == '\0')
+   CleanupStaleRelFilesFrom(tablespaceoid, dboid);
+   }
+   pfree(path);
+ }
+ 
+ /* Scan a specific database in a specific tablespace for stale relations.
+  *
+  * First, pg_class for the database is opened, and the relfilenodes of all
+  * relations mentioned there are stored in a hash table.
+  *
+  * Then the directory is scanned. Every file in the directory that's not
+  * found in pg_class (the hash table) is deleted.
+  */
+ static void
+ CleanupStaleRelFilesFrom(Oid tablespaceoid, Oid dboid)
+ {
+   DIR *dirdesc;
+   

Re: [PATCHES] Cleaning up unreferenced table files

2005-03-05 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Here's a patch for the TODO item Remove unreferenced table files created by 
 transactions 
 that were in-progress when the server terminated abruptly.

xlog.c is a fairly random place to put that functionality.  Didn't it
strike any warning bells for you when you had to add so many new
#includes?  I'm not entirely sure where this should go, but not there.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] Cleaning up unreferenced table files

2005-03-05 Thread Heikki Linnakangas
On Sat, 5 Mar 2005, Tom Lane wrote:
Heikki Linnakangas [EMAIL PROTECTED] writes:
Here's a patch for the TODO item Remove unreferenced table files created by 
transactions
that were in-progress when the server terminated abruptly.
xlog.c is a fairly random place to put that functionality.  Didn't it
strike any warning bells for you when you had to add so many new
#includes?  I'm not entirely sure where this should go, but not there.
Yeah actually it did, but I forgot about it along the way. How about 
putting it in a file of its own in backend/catalog? There's some code that 
also deals with the data directories. Or straight into backend/storage.

- Heikki
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster