Re: [PATCHES] [HACKERS] Hint Bits and Write I/O

2008-06-18 Thread Alvaro Herrera
Simon Riggs wrote:

 When running a VACUUM command we always dirty the block when setting
 hint bits, for a number of reasons:
 * VACUUM FULL expects all hint bits to be set prior to moving tuples
 * Setting all hint bits allows us to truncate the clog
 * it forces the VACUUM to write out its own dirty buffers, which is OK,
 since it is a background process.
 
 Other commands call HeapTupleSatisfiesVacuum(), yet these tasks can be
 more flexible with hint bit setting. These include ANALYZE, CREATE
 INDEX, CLUSTER, HOT pruning and index scan marking deleted tuples (with
 changes in all index AMs). This means we have to differentiate between
 VACUUM and other callers of HeapTupleSatisfiesVacuum().
 
 So the patch changes the APIs of HeapTupleSatisfiesVacuum(),
 SetBufferCommitInfoNeedsSave() and SetHintBits() with changes to 13 AM
 and command files. There are many changes in tqual.c, which seems the
 right way because SetHintBits() is inlined. These make the patch fairly
 large, though most of it is simple changes.

If only VACUUM is going to set flexible to off, maybe it's better to
leave the APIs as they are and have a global that's set by VACUUM only
(and reset in a PG_CATCH block).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] small typo in DTrace docs

2008-06-18 Thread Neil Conway
On Fri, 2008-06-13 at 01:49 -0300, Euler Taveira de Oliveira wrote:
 Attached is a small patch to fix some typos in probes.d path.

Applied, thanks.

-Neil



-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


[PATCHES] Rewrite sinval messaging to reduce contention

2008-06-18 Thread Tom Lane
In this thread back in January
http://archives.postgresql.org/pgsql-performance/2008-01/msg1.php
we saw evidence that contention for SInvalLock was a problem,
and we (or at least I) concluded that the sinval messaging system
needed a nontrivial rewrite to get rid of its habit of sending everyone
to read the queue at the same time.  Here is the proposed rewrite.

The major changes are:

* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding.  The reset-everyone
approach forced unnecessary cache-rebuilding work in backends that
weren't actually behind.

* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't
already have a signal outstanding for it.  When it finishes catching
up, it will pass on the SIGUSR1 to the next-furthest-back guy,
if there is one that is far enough behind to justify a signal.
This changes the former behavior where every backend in the system
might try to hit the queue at the same time into a daisy chain
where only one backend is catching up at a time; furthermore,
backends that aren't well behind aren't forced to catch up too.

* We don't attempt to clean out dead messages after every
message-receipt operation; rather, we do it on the insertion side,
and only when the queue fullness passes certain thresholds.
(As presented, the thresholds are 8/16, 9/16, 10/16, etc, but this
could be adjusted by changing a couple of #defines.)  This greatly
reduces the amount of contention for exclusive hold on SInvalLock.

* Readers attempt to pull multiple messages (as presented, up to 32)
out of the queue for each acquisition of SInvalLock.  Although they
take the lock in shared mode and thus aren't contending at the
LWLock level, I thought this was important to do to reduce contention
for the lock's spinlock --- the prevalence of s_lock() in the before
oprofile convinces me that there was just too much traffic through
the lock itself.

Note: PMSIGNAL_WAKEN_CHILDREN is now unused and should be removed,
but I didn't include that change in the patch.

The test case I'm using looks like this:

$ cat createdrop.sql
create temp table register_reqs(full_key integer, register_index integer) 
WITHOUT OIDS ON COMMIT DROP;
$ pgbench -n -f createdrop.sql -c 40 -t 1000 bench

On my machine CVS HEAD gets about 485 transactions/sec on this; with the
patch that improves to 700 tps.

oprofile says that the routines above 1% of CPU in CVS HEAD are

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) 
with a unit mask of 0x01 (mandatory) count 10
samples  %image name   symbol name
363116   11.2433  postgres CatalogCacheIdInvalidate
3166089.8032  postgres s_lock
3058819.4711  postgres LWLockAcquire
2727178.4442  postgres LWLockRelease
2348127.2706  postgres CatalogCacheFlushRelation
2154456.6709  postgres hash_search_with_hash_value
78544 2.4320  postgres _bt_compare
62007 1.9199  postgres SIGetDataEntry
48614 1.5052  postgres AllocSetAlloc
48033 1.4873  postgres XLogInsert
41280 1.2782  postgres PinBuffer
40659 1.2589  postgres LocalExecuteInvalidationMessage
36906 1.1427  postgres ResetCatalogCache
35708 1.1056  postgres hash_any
34151 1.0574  postgres PrepareToInvalidateCacheTuple

and with the patch

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) 
with a unit mask of 0x01 (mandatory) count 10
samples  %image name   symbol name
550342   13.5424  postgres CatalogCacheIdInvalidate
3626848.9246  postgres hash_search_with_hash_value
1992914.9040  postgres LWLockAcquire
1982774.8790  postgres CatalogCacheFlushRelation
1497163.6841  postgres _bt_compare
1140062.8054  postgres LWLockRelease
1031382.5379  postgres XLogInsert
84471 2.0786  postgres PrepareToInvalidateCacheTuple
78464 1.9308  postgres LocalExecuteInvalidationMessage
75336 1.8538  postgres PinBuffer
74837 1.8415  postgres AllocSetAlloc
63764 1.5691  postgres SIGetDataEntries
60468 1.4879  postgres hash_any
51662 1.2713  libc-2.7.so  memcpy
43999 1.0827  libc-2.7.so  memcmp
43407 1.0681  postgres _bt_checkpage


Re: [PATCHES] Simplify formatting.c

2008-06-18 Thread Bruce Momjian
Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Bruce Momjian wrote:
  
   I moved str_initcap() over into oracle_compat.c and then had initcap()
   convert to/from TEXT to call it.  The code is a little weird because
   str_initcap() needs to convert to text to use texttowcs(), so in
   multibyte encodings initcap converts the string to text, then to char,
   then to text to call texttowcs().  I didn't see a cleaner way to do
   this.
  
  Why not use wchar2char?  It seems there's room for extra cleanup here.
  
  Also, the prototype of str_initcap in builtins.h looks out of place.
 
 I talked to Alvaro on IM, and there is certainly much more cleanup to do
 in this area. I will work from the bottom up.  First, is moving the
 USE_WIDE_UPPER_LOWER define to c.h, and removing TS_USE_WIDE and using
 USE_WIDE_UPPER_LOWER instead.  Patch attached and applied.

The second step is to move wchar2char() and char2wchar() from tsearch
into /mb to be easier to use for other modules;  also move pnstrdup(). 

Patch attached and applied.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/tsearch/ts_locale.c
===
RCS file: /cvsroot/pgsql/src/backend/tsearch/ts_locale.c,v
retrieving revision 1.8
diff -c -c -r1.8 ts_locale.c
*** src/backend/tsearch/ts_locale.c	17 Jun 2008 16:09:06 -	1.8
--- src/backend/tsearch/ts_locale.c	18 Jun 2008 18:37:02 -
***
*** 16,140 
  #include tsearch/ts_locale.h
  #include tsearch/ts_public.h
  
- 
  #ifdef USE_WIDE_UPPER_LOWER
  
- /*
-  * wchar2char --- convert wide characters to multibyte format
-  *
-  * This has the same API as the standard wcstombs() function; in particular,
-  * tolen is the maximum number of bytes to store at *to, and *from must be
-  * zero-terminated.  The output will be zero-terminated iff there is room.
-  */
- size_t
- wchar2char(char *to, const wchar_t *from, size_t tolen)
- {
- 	if (tolen == 0)
- 		return 0;
- 
- #ifdef WIN32
- 	if (GetDatabaseEncoding() == PG_UTF8)
- 	{
- 		int			r;
- 
- 		r = WideCharToMultiByte(CP_UTF8, 0, from, -1, to, tolen,
- NULL, NULL);
- 
- 		if (r = 0)
- 			return (size_t) -1;
- 
- 		Assert(r = tolen);
- 
- 		/* Microsoft counts the zero terminator in the result */
- 		return r - 1;
- 	}
- #endif   /* WIN32 */
- 
- 	return wcstombs(to, from, tolen);
- }
- 
- /*
-  * char2wchar --- convert multibyte characters to wide characters
-  *
-  * This has almost the API of mbstowcs(), except that *from need not be
-  * null-terminated; instead, the number of input bytes is specified as
-  * fromlen.  Also, we ereport() rather than returning -1 for invalid
-  * input encoding.	tolen is the maximum number of wchar_t's to store at *to.
-  * The output will be zero-terminated iff there is room.
-  */
- size_t
- char2wchar(wchar_t *to, size_t tolen, const char *from, size_t fromlen)
- {
- 	if (tolen == 0)
- 		return 0;
- 
- #ifdef WIN32
- 	if (GetDatabaseEncoding() == PG_UTF8)
- 	{
- 		int			r;
- 
- 		/* stupid Microsloth API does not work for zero-length input */
- 		if (fromlen == 0)
- 			r = 0;
- 		else
- 		{
- 			r = MultiByteToWideChar(CP_UTF8, 0, from, fromlen, to, tolen - 1);
- 
- 			if (r = 0)
- 			{
- /* see notes in oracle_compat.c about error reporting */
- pg_verifymbstr(from, fromlen, false);
- ereport(ERROR,
- 		(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
- 		 errmsg(invalid multibyte character for locale),
- 		 errhint(The server's LC_CTYPE locale is probably incompatible with the database encoding.)));
- 			}
- 		}
- 
- 		Assert(r  tolen);
- 		to[r] = 0;
- 
- 		return r;
- 	}
- #endif   /* WIN32 */
- 
- 	if (lc_ctype_is_c())
- 	{
- 		/*
- 		 * pg_mb2wchar_with_len always adds trailing '\0', so 'to' should be
- 		 * allocated with sufficient space
- 		 */
- 		return pg_mb2wchar_with_len(from, (pg_wchar *) to, fromlen);
- 	}
- 	else
- 	{
- 		/*
- 		 * mbstowcs requires ending '\0'
- 		 */
- 		char	   *str = pnstrdup(from, fromlen);
- 		size_t		result;
- 
- 		result = mbstowcs(to, str, tolen);
- 
- 		pfree(str);
- 
- 		if (result == (size_t) -1)
- 		{
- 			pg_verifymbstr(from, fromlen, false);
- 			ereport(ERROR,
- 	(errcode(ERRCODE_CHARACTER_NOT_IN_REPERTOIRE),
- 	 errmsg(invalid multibyte character for locale),
- 	 errhint(The server's LC_CTYPE locale is probably incompatible with the database encoding.)));
- 		}
- 
- 		if (result  tolen)
- 			to[result] = 0;
- 
- 		return result;
- 	}
- }
- 
- 
  int
  t_isdigit(const char *ptr)
  {
--- 16,23 
Index: src/backend/tsearch/ts_utils.c
===
RCS file: /cvsroot/pgsql/src/backend/tsearch/ts_utils.c,v
retrieving revision 1.9
diff -c -c -r1.9 ts_utils.c
*** src/backend/tsearch/ts_utils.c	1 Jan 2008 19:45:52 -	1.9
--- 

Re: [PATCHES] Rewrite sinval messaging to reduce contention

2008-06-18 Thread Tom Lane
I wrote:
 In this thread back in January
 http://archives.postgresql.org/pgsql-performance/2008-01/msg1.php
 we saw evidence that contention for SInvalLock was a problem,
 and we (or at least I) concluded that the sinval messaging system
 needed a nontrivial rewrite to get rid of its habit of sending everyone
 to read the queue at the same time.  Here is the proposed rewrite.

After further thought, I came up with two more improvements:

* It turns out not to be too hard to apply the idea of moving multiple
messages per LWLock acquisition on the writing side too, so I did that.

* There is not actually any good reason why writers have to block
readers or vice versa, except in the infrequent case that SICleanupQueue
is needed.  Writers don't look at the ProcStates that the readers are
changing.  The only point of overlap is that a writer will change
maxMsgNum which readers want to look at --- but we are already assuming
in several places that reading or writing an int is atomic, and if we
assume that here too, it seems to work fine.  To implement this, I split
SInvalLock into SInvalReadLock and SInvalWriteLock.

This seems to be reaching a point of diminishing returns for the
particular test case I'm using --- the TPS rate only went up from 700
to 730 or so.  However, enabling LWLOCK_STATS shows that the contention
rate on the sinval locks is now completely negligible --- one block
per thousand acquisitions on SInvalWriteLock, and less than one in
1 on SInvalReadLock.  The vast majority of the LWLock contention
now comes from WALInsertLock and the LockMgr locks:

Lock# acquisitions  # times blocked

SInvalReadLock  6469840 380
SInvalWriteLock 240567  163
WALInsertLock   2388805 89142
LockMgr partition locks 8253142 177715

So I think this patch accomplishes the goal of making sinval not be a
cause of contention.

Patch version 2 attached.

regards, tom lane


binW9bQG9JmZj.bin
Description: sinval-rewrite-2.patch.gz

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches