Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-01-23 Thread Rajeev rastogi
On 23/01/14 14:45, Christian Kruse wrote:
  Well, is it context or detail?  Those fields have reasonably well
  defined meanings IMO.
 
 I find the distinction somewhat blurry and think both would be
 appropriate. But since I wasn't sure I changed to detail.
 
  If we need errcontext_plural, let's add it, not adopt inferior
  solutions just because that isn't there for lack of previous need.
 
 I would've added it if I would've been sure.
 
  But having said that, I think this is indeed detail not context.
  (I kinda wonder whether some of the stuff that's now in the primary
  message shouldn't be pushed to errdetail as well.  It looks like some
  previous patches in this area have been lazy.)
 
 I agree, the primary message is not very well worded. On the other hand
 finding an appropriate alternative seems hard for me.
 
  While I'm griping, this message isn't even trying to follow the
  project's message style guidelines.  Detail or context messages are
  supposed to be complete sentence(s), with capitalization and
 punctuation to match.
 
 Hm, I hope I fixed it in this version of the patch.
 
  Lastly, is this information that we want to be shipping to clients?
  Perhaps from a security standpoint that's not such a wise idea, and
  errdetail_log() is what should be used.
 
 Fixed. I added an errdetail_log_plural() for this, too.

I think you have attached wrong patch.

Thanks and Regards,
Kumar Rajeev Rastogi
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: variant of regclass

2014-01-23 Thread Amit Khandekar
On 22 January 2014 13:09, Tatsuo Ishii is...@postgresql.org wrote:

  I, as a user would be happier if we also have to_regprocedure() and
  to_regoperator(). The following query looks a valid use-case where one
  needs to find if a particular function exists. Using to_regproc('sum')
 does
  not make sense here because it will return InvalidOid, which will not
 tell
  us whether that is because there is no such function or whether there are
  duplicate function names.
  select * from pg_proc where oid = to_regprocedure('sum(int)');

 I doubt the value of the use case above. Hasn't psql already done an
 excellent job?

 test=# \df sum
  List of functions
Schema   | Name | Result data type | Argument data types | Type
 +--+--+-+--
  pg_catalog | sum  | numeric  | bigint  | agg
  pg_catalog | sum  | double precision | double precision| agg
  pg_catalog | sum  | bigint   | integer | agg
  pg_catalog | sum  | interval | interval| agg
  pg_catalog | sum  | money| money   | agg
  pg_catalog | sum  | numeric  | numeric | agg
  pg_catalog | sum  | real | real| agg
  pg_catalog | sum  | bigint   | smallint| agg
 (8 rows)

 If you need simliar functionality in the backend, you could always
 define a view using the query generated by psql.

 * QUERY **
 SELECT n.nspname as Schema,
   p.proname as Name,
   pg_catalog.pg_get_function_result(p.oid) as Result data type,
   pg_catalog.pg_get_function_arguments(p.oid) as Argument data types,
  CASE
   WHEN p.proisagg THEN 'agg'
   WHEN p.proiswindow THEN 'window'
   WHEN p.prorettype = 'pg_catalog.trigger'::pg_catalog.regtype THEN
 'trigger'
   ELSE 'normal'
  END as Type
 FROM pg_catalog.pg_proc p
  LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
 WHERE p.proname ~ '^(sum)$'
   AND pg_catalog.pg_function_is_visible(p.oid)
 ORDER BY 1, 2, 4;
 **


I thought the general use case is to be able to use such a functionality
using SQL queries (as against \df), so that the DBA can automate things,
without having to worry about the query returning error. And hence, I
thought to_regprocedure() can be used in a query just like how
::regprocedure is used.



 Best regards,
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan
 English: http://www.sraoss.co.jp/index_en.php
 Japanese: http://www.sraoss.co.jp



Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-01-23 Thread Christian Kruse
Hi,

 I think you have attached wrong patch.

Hurm. You are right, attached v3, not v4. Sorry.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ee6c24c..6c648cf 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1195,13 +1195,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		 */
 		if (log_lock_waits  deadlock_state != DS_NOT_YET_CHECKED)
 		{
-			StringInfoData buf;
+			StringInfoData buf,
+		lock_waiters_sbuf,
+		lock_holders_sbuf;
 			const char *modename;
 			long		secs;
 			int			usecs;
 			long		msecs;
+			SHM_QUEUE  *procLocks;
+			PROCLOCK   *proclock;
+			bool		first_holder = true,
+		first_waiter = true;
+			int			lockHoldersNum = 0;
 
 			initStringInfo(buf);
+			initStringInfo(lock_waiters_sbuf);
+			initStringInfo(lock_holders_sbuf);
+
 			DescribeLockTag(buf, locallock-tag.lock);
 			modename = GetLockmodeName(locallock-tag.lock.locktag_lockmethodid,
 	   lockmode);
@@ -1211,10 +1221,67 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			msecs = secs * 1000 + usecs / 1000;
 			usecs = usecs % 1000;
 
+			/*
+			 * we loop over the lock's procLocks to gather a list of all
+			 * holders and waiters. Thus we will be able to provide more
+			 * detailed information for lock debugging purposes.
+			 *
+			 * lock-procLocks contains all processes which hold or wait for
+			 * this lock.
+			 */
+
+			LWLockAcquire(partitionLock, LW_SHARED);
+
+			procLocks = (lock-procLocks);
+			proclock = (PROCLOCK *) SHMQueueNext(procLocks, procLocks,
+			   offsetof(PROCLOCK, lockLink));
+
+			while (proclock)
+			{
+/*
+ * we are a waiter if myProc-waitProcLock == proclock; we are
+ * a holder if it is NULL or something different
+ */
+if (proclock-tag.myProc-waitProcLock == proclock)
+{
+	if (first_waiter)
+	{
+		appendStringInfo(lock_waiters_sbuf, %d,
+		 proclock-tag.myProc-pid);
+		first_waiter = false;
+	}
+	else
+		appendStringInfo(lock_waiters_sbuf, , %d,
+		 proclock-tag.myProc-pid);
+}
+else
+{
+	if (first_holder)
+	{
+		appendStringInfo(lock_holders_sbuf, %d,
+		 proclock-tag.myProc-pid);
+		first_holder = false;
+	}
+	else
+		appendStringInfo(lock_holders_sbuf, , %d,
+		 proclock-tag.myProc-pid);
+
+	lockHoldersNum++;
+}
+
+proclock = (PROCLOCK *) SHMQueueNext(procLocks, proclock-lockLink,
+			   offsetof(PROCLOCK, lockLink));
+			}
+
+			LWLockRelease(partitionLock);
+
 			if (deadlock_state == DS_SOFT_DEADLOCK)
 ereport(LOG,
 		(errmsg(process %d avoided deadlock for %s on %s by rearranging queue order after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errdetail_log_plural(Process holding the lock: %s. Request queue: %s.,
+		Processes holding the lock: %s. Request queue: %s.,
+			   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data;
 			else if (deadlock_state == DS_HARD_DEADLOCK)
 			{
 /*
@@ -1226,13 +1293,19 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
  */
 ereport(LOG,
 		(errmsg(process %d detected deadlock while waiting for %s on %s after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errdetail_log_plural(Process holding the lock: %s. Request queue: %s.,
+			Processes holding lock: %s. Request queue: %s.,
+			   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data;
 			}
 
 			if (myWaitStatus == STATUS_WAITING)
 ereport(LOG,
 		(errmsg(process %d still waiting for %s on %s after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+		 (errdetail_log_plural(Process holding the lock: %s. Request queue: %s.,
+		Processes holding the lock: %s. Request queue: %s.,
+			   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data;
 			else if (myWaitStatus == STATUS_OK)
 ereport(LOG,
 	(errmsg(process %d acquired %s on %s after %ld.%03d ms,
@@ -1252,7 +1325,10 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 if (deadlock_state != DS_HARD_DEADLOCK)
 	ereport(LOG,
 			(errmsg(process %d failed to acquire %s on %s after %ld.%03d ms,
-			  MyProcPid, modename, buf.data, msecs, usecs)));
+MyProcPid, modename, buf.data, msecs, usecs),
+			 (errdetail_log_plural(Process holding the lock: %s. Request queue: %s.,
+		Processes holding the lock: %s. Request queue: %s.,
+   lockHoldersNum, lock_holders_sbuf.data, lock_waiters_sbuf.data;
 			}
 
 			/*
@@ 

Re: [HACKERS] [PATCH] Relocation of tablespaces in pg_basebackup

2014-01-23 Thread Steeve Lennmark
Peter,
On Thu, Jan 23, 2014 at 2:06 AM, Peter Eisentraut pete...@gmx.net wrote:

 I'm tempted to think it should be mandatory to specify this option in
 plain mode when tablespaces are present.  Otherwise, creating a base
 backup is liable to create random files all over the place.  Obviously,
 there would be backward compatibility concerns.


That was my initial thought too, the one thing that speaks FOR a change
in behaviour is that there isn't a lot of people who have moved over to
pg_basebackup yet and even fewer who use multiple tablespaces. For me
at least pg_basebackup isn't an option at the moment since I use more
than one tablespace.

I'm not totally happy with the choice of : as the mapping separator,
 because that would always require escaping on Windows.  We could make it
 analogous to the path handling and make ; the separator on Windows.
 Then again, this is not a path, so maybe it should look like one.  We
 pick something else altogether, like =.

 The option name --tablespace isn't very clear.  It ought to be
 something like --tablespace-mapping.


This design choice I made about -T (--tablespace) and colon as
separator was copied from pg_barman, which is the way I back up my
clusters at the moment. Renaming the option to --tablespace-mapping and
changing the syntax to something like = is totally fine by me.


 I don't think we should require the new directory to be an absolute
 path.  It should be relative to the current directory, just like the -D
 option does it.


Accepting a relative path should be fine, I made a failed attempt using
realpath(3) initially but I guess checking for [0] != '/' and
prepending getcwd(3) would suffice.

I would try to write this patch without using MAXPGPATH.  I know
 existing code uses it, but we should try to use it less, because it
 overallocates memory and requires handling additional error conditions.
 For example, you catch overflow in tablespace_list_append() but later
 report that as invalid tablespace format.  We now have psprintf() to
 make coding with dynamic memory allocation easier.


Is overallocating memory in a cli application really an issue though? I
will obviously rewrite the code to fix that if necessary.

It looks like when you ignore an escaped : as a separator, you don't
 actually unescape it for use as a directory.


+ if (*arg == '\\'  *(arg + 1) == ':')
+ ;

This code handles that case, I could try to make that cleaner.


 Somehow, I had the subconscious assumption that this feature would do
 prefix matching on the directories, not only complete string matching.
 So if I had tablespaces in /mnt/data1, /mnt/data2, /mnt/data3, I could
 map them all with -T /mnt:mnt and be done.  Not sure if that is useful
 to many, but it's worth a thought.


I like that a lot, but I'm afraid the code would have to get a bit more
complex to add that functionality. It would be an easier rewrite if we
added a hint character, something like -T '/mnt/*:mnt'.


 Review style guide for error messages:
 http://www.postgresql.org/docs/devel/static/error-style-guide.html


I will do that.

We need to think about how to handle this on platforms without symlinks.
 I don't like just printing an error message and moving on.  It should be
 either pass or fail or an option to choose between them.


I hope someone with experience with those kind of systems can come up
with suggestions on how to solve that. I only run postgres on Linux.



 Please put the options in the getopt handling in alphabetical order.
 It's not very alphabetical now, but between D and F is probably not the
 best place. ;-)


Done.

//Steeve


Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-23 Thread Dean Rasheed
On 23 January 2014 06:13, KaiGai Kohei kai...@ak.jp.nec.com wrote:
 A minor comment is below:

 !   /*
 !* Make sure that the query is marked correctly if the added
 qual
 !* has sublinks.
 !*/
 !   if (!parsetree-hasSubLinks)
 !   parsetree-hasSubLinks = checkExprHasSubLink(viewqual);

 Is this logic really needed? This flag says the top-level query
 contains SubLinks, however, the above condition checks whether
 a sub-query to be constructed shall contain SubLinks.
 Also, securityQuals is not attached to the parse tree right now.


Thanks for looking at this.

Yes that logic is needed. Without it the top-level query wouldn't be
marked as having sublinks, which could cause all sorts of things to go
wrong --- for example, without it fireRIRrules() wouldn't walk the
query tree looking for SELECT rules in sublinks to expand, so a
security barrier qual with a sublink subquery that selected from
another view wouldn't work.

It is not true to say that securityQuals is not attached to the parse
tree. The securityQuals *are* part of the parse tree, they just
happen to be held in a different place to keep them isolated from
other quals. So the remaining rewriter code that walks or mutates the
parse tree will process them just like any other quals, recursively
expanding any rules they contain.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Andres Freund
On 2014-01-22 18:19:25 -0600, Jim Nasby wrote:
 On 1/21/14, 6:46 PM, Andres Freund wrote:
 On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote:
 On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com  
 wrote:
  I personally think this isn't worth complicating the code for.
 
 You're probably right. However, I don't see why the bar has to be very
 high when we're considering the trade-off between taking some
 emergency precaution against having a PANIC shutdown, and an assured
 PANIC shutdown
 Well, the problem is that the tradeoff would very likely include making
 already complex code even more complex. None of the proposals, even the
 one just decreasing the likelihood of a PANIC, like like they'd end up
 being simple implementation-wise.
 And that additional complexity would hurt robustness and prevent things
 I find much more important than this.
 
 If we're not looking for perfection, what's wrong with Peter's idea of
 a ballast file? Presumably the check to see if that file still exists
 would be cheap so we can do that before entering the appropriate
 critical section.

That'd be noticeably expensive. Opening/stat a file isn't cheap,
especially if you do it via filename and not via fd which we'd have to
do. I am pretty sure it would be noticeably in single client workloads,
but it'd damned sure will be noticeable on busy multi-socket workloads.

I still think doing the checks in the wal writer is the best bet,
setting a flag that can then cheaply be tested in shared memory. When
set it will cause any further action that will write xlog to error out
unless it's already in progress.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-01-23 Thread Andres Freund
Hi,

On 2014-01-22 13:00:44 -0500, Robert Haas wrote:
 Well, apparently, one is going to PANIC and reinitialize the system.
 I presume that upon reinitialization we'll decide that the slot is
 gone, and thus won't recreate it in shared memory.

Yea, and if it's half-gone we'll continue deletion. And since yesterday
evening we'll even fsync things during startup to handle scenarios
similar to 20140122162115.gl21...@alap3.anarazel.de .

 Of course, if the entire system suffers a hard power failure after that and 
 before the
 directory is succesfully fsync'd, then the slot could reappear on the
 next startup. Which is also exactly what would happen if we removed
 the slot from shared memory after doing the unlink, and then the
 system suffered a hard power failure before the directory contents
 made it to disk.  Except that we also panicked.

Yes, but that could only happen as long as no relevant data has been
lost since we hold relevant locks during this.

 In the case of shared buffers, the way we handle fsync failures is by
 not allowing the system to checkpoint until all of the fsyncs succeed.

I don't think shared buffers fsyncs are the apt comparison. It's more
something like UpdateControlFile(). Which PANICs.

I really don't get why you fight PANICs in general that much. There are
some nasty PANICs in postgres which can happen in legitimate situations,
which should be made to fail more gracefully, but this surely isn't one
of them. We're doing rename(), unlink() and rmdir(). That's it.
We should concentrate on the ones that legitimately can happen, not the
ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or
mount -o remount,ro /. We don't increase reliability by a bit adding
codepaths that will never get tested.

 If there's an OS-level reset before that happens, WAL replay will
 perform the same buffer modifications over again and the next
 checkpoint will again try to flush them to disk and will not complete
 unless it does.  That forms a closed system where we never advance the
 redo pointer over the covering WAL record until the changes it covers
 are on the disk.  But I don't think this code has any similar
 interlock; if it does, I missed it.

No, it doesn't (until the first rename() at least), but the number of
failure scenarios is far smaller.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-23 Thread KONDO Mitsumasa

(2014/01/23 10:28), Peter Geoghegan wrote:

On Wed, Jan 22, 2014 at 5:28 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:

Oh, thanks to inform me. I think essential problem of my patch has bottle
neck in sqrt() function and other division caluculation.


Well, that's a pretty easy theory to test. Just stop calling them (and
do something similar to what we do for current counter fields instead)
and see how much difference it makes.
What means calling them? I think that part of heavy you think is 
pg_stat_statement view that is called by select query, not a part of LWLock 
getting statistic by hook. Right?


I tested my patch in pgbench, but I cannot find bottleneck of my latest patch.
(Sorry, I haven't been test select query in pg_stat_statement view...) Here is a 
test result.


* Result (Test result is represented by tps.)
   method|  try1  |  try2  |  try3

without pgss | 130938 | 131558 | 131796
with pgss| 125164 | 125146 | 125358
with patched pgss| 126091 | 126681 | 126433


* Test Setting
shared_buffers=1024MB
checkpoint_segments = 300
checkpoint_timeout = 15min
checkpoint_completion_target = 0.7


* pgbench script
pgbench -h xxx.xxx.xxx.xxx mitsu-ko -i -s 100
psql -h xxx.xxx.xxx.xxx mitsu-ko -c 'CHECKPOINT'
pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180
pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180
pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180


* Server SPEC:
CPU: Xeon E5-2670 1P/8C 2.6GHz #We don't have 32 core cpu...
Memory: 24GB
RAID: i420 2GB cache
Disk: 15K * 6 (RAID 1+0)


Attached is latest developping patch. It hasn't been test much yet, but sqrt 
caluclation may be faster.


Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***
*** 19,24  CREATE FUNCTION pg_stat_statements(
--- 19,27 
  OUT query text,
  OUT calls int8,
  OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT stdev_time float8,
  OUT rows int8,
  OUT shared_blks_hit int8,
  OUT shared_blks_read int8,
***
*** 41,43  CREATE VIEW pg_stat_statements AS
--- 44,52 
SELECT * FROM pg_stat_statements();
  
  GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ /* New Function */
+ CREATE FUNCTION pg_stat_statements_reset_time()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
*** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***
*** 9,14  RETURNS void
--- 9,19 
  AS 'MODULE_PATHNAME'
  LANGUAGE C;
  
+ CREATE FUNCTION pg_stat_statements_reset_time()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
  CREATE FUNCTION pg_stat_statements(
  OUT userid oid,
  OUT dbid oid,
***
*** 16,21  CREATE FUNCTION pg_stat_statements(
--- 21,29 
  OUT query text,
  OUT calls int8,
  OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT stdev_time float8,
  OUT rows int8,
  OUT shared_blks_hit int8,
  OUT shared_blks_read int8,
***
*** 42,44  GRANT SELECT ON pg_stat_statements TO PUBLIC;
--- 50,53 
  
  -- Don't want this to be available to non-superusers.
  REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC;
*** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql
***
*** 4,8 
--- 4,9 
  \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
  
  ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset();
+ ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time();
  ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements();
  ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements;
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 78,84  static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
  #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
  #define STICKY_DECREASE_FACTOR	(0.50)	/* factor for sticky entries */
  #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
! 
  #define JUMBLE_SIZE1024	/* query serialization buffer size */
  
  /*
--- 78,85 
  #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
  #define STICKY_DECREASE_FACTOR	(0.50)	/* factor for sticky entries */
  #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
! #define EXEC_TIME_INIT_MIN	DBL_MAX		/* initial execution min time */
! #define	EXEC_TIME_INIT_MAX	-DBL_MAX	/* initial execution max time */
  

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Simon Riggs
On 23 January 2014 01:19, Jim Nasby j...@nasby.net wrote:
 On 1/21/14, 6:46 PM, Andres Freund wrote:

 On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote:

 On Tue, Jan 21, 2014 at 3:43 PM, Andres Freundand...@2ndquadrant.com
  wrote:

  I personally think this isn't worth complicating the code for.

 
 You're probably right. However, I don't see why the bar has to be very
 high when we're considering the trade-off between taking some
 emergency precaution against having a PANIC shutdown, and an assured
 PANIC shutdown

 Well, the problem is that the tradeoff would very likely include making
 already complex code even more complex. None of the proposals, even the
 one just decreasing the likelihood of a PANIC, like like they'd end up
 being simple implementation-wise.
 And that additional complexity would hurt robustness and prevent things
 I find much more important than this.


 If we're not looking for perfection, what's wrong with Peter's idea of a
 ballast file? Presumably the check to see if that file still exists would be
 cheap so we can do that before entering the appropriate critical section.

 There's still a small chance that we'd end up panicing, but it's better than
 today. I'd argue that even if it doesn't work for CoW filesystems it'd still
 be a win.

I grant that it does sound simple enough for a partial stop gap.

My concern is that it provides only a short delay before the eventual
disk-full situation, which it doesn't actually prevent.

IMHO the main issue now is how we clear down old WAL files. We need to
perform a checkpoint to do that - and as has been pointed out in
relation to my proposal, we cannot complete that because of locks that
will be held for some time when we do eventually lock up.

That issue is not solved by having a ballast file(s).

IMHO we need to resolve the deadlock inherent in the
disk-full/WALlock-up/checkpoint situation. My view is that can be
solved in a similar way to the way the buffer pin deadlock was
resolved for Hot Standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-23 Thread Andres Freund
On 2014-01-15 02:11:11 -0300, Alvaro Herrera wrote:
 Then execute commands to your liking.

So, I am giving a quick look, given that I very likely will have to
write a consumer for it...

* deparse_utility_command returns payload via parameter, not return
  value?
* functions beginning with an underscore formally are reserved, we
  shouldn't add new places using such a convention.
* I don't think dequote_jsonval is correct as is, IIRC that won't
  correctly handle unicode escapes and such. I think json.c needs to
  expose functionality for this.
* I wonder if expand_jsonval_identifier shouldn't truncate as well? It
  should get handled by the individual created commands, right?
* So, if I read things correctly, identifiers in json are never
  downcased, is that correct? I.e. they are implicitly quoted?
* Why must we not schema qualify system types
  (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to
  me to just use pg_catalog there.
* It looks like pg_event_trigger_expand_command will misparse nested {,
  error out instead?
* I wonder if deparseColumnConstraint couldn't somehow share a bit more
  code with ruleutils.c's pg_get_constraintdef_worker().
* I guess you know, but deparseColumnConstraint() doesn't handle foreign
  keys yet.
* Is tcop/ the correct location for deparse_utility.c? I wonder if it
  shouldn't be alongside ruleutils.c instead.
* shouldn't pg_event_trigger_get_creation_commands return command as
  json instead of text?

* Not your department, but a builtin json pretty printer would be
  really, really handy. Something like
CREATE FUNCTION json_prettify(j json)
RETURNS TEXT AS $$
import json
return json.dumps(json.loads(j), sort_keys=True, indent=4)
$$ LANGUAGE PLPYTHONU;
 makes the json so much more readable.

Some minimal tests:
* CREATE SEQUENCE errors out with:
NOTICE:  JSON blob: 
{sequence:{relation:frakbar2,schema:public},persistence:,fmt:CREATE
 %{persistence}s SEQUENCE %{identity}D}
ERROR:  non-existant element identity in JSON formatting object
*CREATE TABLE frakkbar2(id int); error out with:
postgres=# CREATE TABLE frakkbar2(id int);
NOTICE:  JSON blob: 
{on_commit:{present:false,on_commit_value:null,fmt:ON COMMIT 
%{on_commit_value}s},tablespace:{present:false,tablespace:null,fmt:TABLESPACE
 %{tablespace}I},inherits:{present:false,parents:null,fmt:INHERITS 
(%{parents:, 
}D)},table_elements:[{collation:{present:false,fmt:COLLATE 
%{name}I},type:{typmod:,typename:integer,is_system:true,is_array:false},name:id,fmt:%{name}I
 %{type}T %{collation}s}],of_type:{present:false,of_type:null,fmt:OF 
%{of_type}T},if_not_exists:,identity:{relation:frakkbar2,schema:public},persistence:,fmt:CREATE
 %{persistence}s TABLE %{identity}D %{if_not_exists}s %{of_type}s 
(%{table_elements:, }s) %{inherits}s %{on_commit}s %{tablespace}s}
ERROR:  invalid NULL is_system flag in %T element
CONTEXT:  PL/pgSQL function snitch() line 8 at RAISE

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Review] inherit support for foreign tables

2014-01-23 Thread Etsuro Fujita
Shigeru Hanada wrote:
 Attached patch allows a foreign table to be a child of a table. It
 also allows foreign tables to have CHECK constraints.

Sorry for the delay. I started to look at this patch.

 Though this would be debatable, in current implementation, constraints
 defined on a foreign table (now only NOT NULL and CHECK are supported)
 are not enforced during INSERT or UPDATE executed against foreign
 tables. This means that retrieved data might violates the constraints
 defined on local side. This is debatable issue because integrity of
 data is important for DBMS, but in the first cut, it is just
 documented as a note.

I agree with you, but we should introduce a new keyword such as
ASSERTIVE or something in 9.4, in preparation for the enforcement of
constraints on the local side in a future release? What I'm concerned
about is, otherwise, users will have to rewrite those DDL queries at
that point. No?

 Because I don't see practical case to have a foreign table as a
 parent, and it avoid a problem about recursive ALTER TABLE operation,
 foreign tables can't be a parent.

I agree with you on that point.

 For other commands recursively processed such as ANALYZE, foreign
 tables in the leaf of inheritance tree are ignored.

I'm not sure that in the processing of the ANALYZE command, we should
ignore child foreign tables. It seems to me that the recursive
processing is not that hard. Rather what I'm concerned about is that if
the recursive processing is allowed, then autovacuum will probably have
to access to forign tables on the far side in order to acquire those
sample rows. It might be undesirable from the viewpoint of security or
from the viewpoint of efficiency.

--- a/doc/src/sgml/ref/create_foreign_table.sgml
+++ b/doc/src/sgml/ref/create_foreign_table.sgml
@@ -22,6 +22,7 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable
class=PARAMETERtable_name
replaceable class=PARAMETERcolumn_name/replaceable replaceable
class=PARAMETERdata_type/replaceable [ OPTIONS ( replaceable
class=PA\
RAMETERoption/replaceable 'replaceable
class=PARAMETERvalue/replaceable' [, ... ] ) ] [ COLLATE
replaceablecollation/replaceable ] [ rep\
laceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
[, ... ]
] )
+[ INHERITS ( replaceableparent_table/replaceable [, ... ] ) ]
SERVER replaceable class=parameterserver_name/replaceable
[ OPTIONS ( replaceable class=PARAMETERoption/replaceable
'replaceable class=PARAMETERvalue/replaceable' [, ... ] ) ]

@@ -159,6 +160,17 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] replaceable
class=PARAMETERtable_name
/varlistentry

varlistentry
+ termreplaceable class=PARAMETERparent_table/replaceable/term
+ listitem
+ para
+ The name of an existing table or foreign table from which the new foreign
+ table automatically inherits all columns, see
+ xref linkend=ddl-inherit for details of table inheritance.
+ /para
+ /listitem
+ /varlistentry

Correct? I think we should not allow a foreign table to be a parent.

I'll look at this patch more closely.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] commit fest 2014-01 week 1 report

2014-01-23 Thread Peter Eisentraut
We started this commit fest with 103 patches and got off to a flyer,
with 20 patches committed during the first week.  Somehow we ended up
with 113 patches at the end of the first week, but let's please leave it
at that.

I'm tracking down patch authors who have not signed up for a review.
I'll also be following up with reviewers to send in their first review soon.

It looks like we could especially use a few reviewers for a number of
(small) Windows-specific changes.  If you have access to that, please
sign up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2014-01-23 Thread Andres Freund
Hi,

On 2014-01-15 02:00:51 -0500, Jaime Casanova wrote:
 On Mon, Nov 18, 2013 at 12:27 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Hi,
 
  On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote:
  sorry, i clearly misunderstood you. attached a rebased patch with
  those functions removed.
 
  * --write-standby-enable seems to loose quite some functionality in
comparison to --write-recovery-conf since it doesn't seem to set
primary_conninfo, standby anymore.
 
 we can add code that looks for postgresql.conf in $PGDATA but if
 postgresql.conf is not there (like the case in debian, there is not
 much we can do about it) or we can write a file ready to be included
 in postgresql.conf, any sugestion?

People might not like me for the suggestion, but I think we should
simply always include a 'recovery.conf' in $PGDATA
unconditionally. That'd make this easier.
Alternatively we could pass a filename to --write-recovery-conf.

  * CheckRecoveryReadyFile() doesn't seem to be a very descriptive
function name.
 
 I left it as CheckStartingAsStandby() but i still have a problem of
 this not being completely clear. this function is useful for standby
 or pitr.
 which leads me to the other problem i have: the recovery trigger file,
 i have left it as standby.enabled but i still don't like it.

 recovery.trigger (Andres objected on this name)
 forced_recovery.trigger
 user_forced_recovery.trigger

stay_in_recovery.trigger? That'd be pretty clear for anybody involved in
pg, but otherwise...


  * the description of archive_cleanup_command seems wrong to me.
 
 why? it seems to be the same that was in recovery.conf. where did you
 see the description you're complaining at?

I dislike the description in guc.c

 + {archive_cleanup_command, PGC_POSTMASTER, 
 WAL_ARCHIVE_RECOVERY,
 + gettext_noop(Sets the shell command that will be 
 executed at every restartpoint.),
 + NULL
 + },
 + archive_cleanup_command,

previously it was:

 -# specifies an optional shell command to execute at every restartpoint.
 -# This can be useful for cleaning up the archive of a standby server.

  * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being
really strangely formatted (multiline :? inside a function?) it
doesn't a) seem to be correct to ignore potential memory allocation
errors by just switching back into the context that just errored out,
and continue to work there b) forgo cleanup by just continuing as if
nothing happened. That's unlikely to be acceptable.
 
 the code that read recovery.conf didn't has that, so i just removed it

Well, that's not necessarily correct. recovery.conf was only read during
startup, while this is read on SIGHUP.

  * Why do you include xlog_internal.h in guc.c and not xlog.h?

 we actually need both but including xlog_internal.h also includes xlog.h
 i added xlog.h and if someone things is enough only putting
 xlog_internal.h let me know

What's required from xlog_internal.h?

 diff --git a/src/backend/access/transam/xlog.c 
 b/src/backend/access/transam/xlog.c
 index b53ae87..54f6a0d 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -64,11 +64,12 @@
  extern uint32 bootstrap_data_checksum_version;
  
  /* File path names (all relative to $PGDATA) */
 -#define RECOVERY_COMMAND_FILErecovery.conf
 -#define RECOVERY_COMMAND_DONErecovery.done
 +#define RECOVERY_ENABLE_FILE standby.enabled

Imo the file and variable names should stay coherent.

 +/* recovery.conf is not supported anymore */
 +#define RECOVERY_COMMAND_FILErecovery.conf

 +bool StandbyModeRequested = false;
 +static TimestampTz recoveryDelayUntilTime;

This imo should be lowercase now, the majority of GUC variables are.

 +/* are we currently in standby mode? */
 +bool StandbyMode = false;

Why did you move this?

 - if (rtliGiven)
 + if (strcmp(recovery_target_timeline_string, ) != 0)
   {

Why not have the convention that NULL indicates a unset target_timeline
like you use for other GUCs? Mixing things like this is confusing.

Why is recovery_target_timeline stored as a string? Because it's a
unsigned int? If so, you should have an assign hook setting up a)
rtliGiven, b) properly typed variable.

 - if (rtli)
 + if (recovery_target_timeline)
   {
   /* Timeline 1 does not have a history file, all else 
 should */
 - if (rtli != 1  !existsTimeLineHistory(rtli))
 + if (recovery_target_timeline != 1  
 + 
 !existsTimeLineHistory(recovery_target_timeline))
   ereport(FATAL,
   (errmsg(recovery target 
 timeline %u does not exist,
 - rtli)));
 - recoveryTargetTLI = rtli;

Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Kohei KaiGai
Isn't it necessary to have an interface to initialize LWLock structure being
allocated on a dynamic shared memory segment?
Even though LWLock structure is exposed at lwlock.h, we have no common
way to initialize it.

How about to have a following function?

void
InitLWLock(LWLock *lock)
{
SpinLockInit(lock-lock.mutex);
lock-lock.releaseOK = true;
lock-lock.exclusive = 0;
lock-lock.shared = 0;
lock-lock.head = NULL;
lock-lock.tail = NULL;
}

Thanks,

2014/1/22 KaiGai Kohei kai...@ak.jp.nec.com:
 (2014/01/22 1:37), Robert Haas wrote:

 On Mon, Jan 20, 2014 at 11:23 PM, KaiGai Kohei kai...@ak.jp.nec.com
 wrote:

 I briefly checked the patch. Most of lines are mechanical replacement
 from LWLockId to LWLock *, and compiler didn't claim anything with
 -Wall -Werror option.

 My concern is around LWLockTranche mechanism. Isn't it too complicated
 towards the purpose?
 For example, it may make sense to add char lockname[NAMEDATALEN]; at
 the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it
 also adds an argument of LWLockAssign() to gives the human readable
 name. Is the additional 64bytes (= NAMEDATALEN) per lock too large
 for recent hardware?


 Well, we'd need it when either LOCK_DEBUG was defined or when
 LWLOCK_STATS was defined or when --enable-dtrace was used, and while
 the first two are probably rarely enough used that that would be OK, I
 think the third case is probably fairly common, and I don't think we
 want to have such a potentially performance-critical difference
 between builds with and without dtrace.

 Also... yeah, it's a lot of memory.  If we add an additional 64 bytes
 to the structure, then we're looking at 96 bytes per lwlock instead of
 32, after padding out to a 32-byte boundary to avoid crossing cache
 lines.  We need 2 lwlocks per buffer, so that's an additional 128
 bytes per 8kB buffer.  For shared_buffers = 8GB, that's an extra 128MB
 for storing lwlocks.  I'm not willing to assume nobody cares about
 that.  And while I agree that this is a bit complex, I don't think
 it's really as bad as all that.  We've gotten by for a long time
 without people being able to put lwlocks in other parts of memory, and
 most extension authors have gotten by with LWLockAssign() just fine
 and can continue doing things that way; only users with particularly
 sophisticated needs should bother to worry about the tranche stuff.

 Hmm... 1/64 of main memory (if large buffer system) might not be
 an ignorable memory consumption.


 One idea I just had is to improve the dsm_toc module so that it can
 optionally set up a tranche of lwlocks for you, and provide some
 analogues of RequestAddinLWLocks and LWLockAssign for that case.  That
 would probably make this quite a bit simpler to use, at least for
 people using it with dynamic shared memory.  But I think that's a
 separate patch.

 I agree with this idea. It seems to me quite natural to keep properties
 of objects held on shared memory (LWLock) on shared memory.
 Also, a LWLock once assigned shall not be never released. So, I think
 dsm_toc can provide a well suitable storage for them.


 Thanks,
 --
 OSS Promotion Center / The PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Andres Freund
On 2014-01-23 23:03:40 +0900, Kohei KaiGai wrote:
 Isn't it necessary to have an interface to initialize LWLock structure being
 allocated on a dynamic shared memory segment?
 Even though LWLock structure is exposed at lwlock.h, we have no common
 way to initialize it.

There's LWLockInitialize() or similar in the patch afair.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Kohei KaiGai
2014/1/23 Andres Freund and...@2ndquadrant.com:
 On 2014-01-23 23:03:40 +0900, Kohei KaiGai wrote:
 Isn't it necessary to have an interface to initialize LWLock structure being
 allocated on a dynamic shared memory segment?
 Even though LWLock structure is exposed at lwlock.h, we have no common
 way to initialize it.

 There's LWLockInitialize() or similar in the patch afair.

Ahh, I oversight the code around tranche-id. Sorry for this noise.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Amit Kapila
On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 On Wed, Jan 22, 2014 at 9:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So?  Anything which can know the value of a GUC parameter can certainly
 know the selected port number.

 1. In case of registration of event source, either user has to pass the name
 or it uses hard coded default value, so if we use version number along 
 with
 'PostgreSQL', it can be consistent.
 I don't see any way pgevent.c can know port number to append it to 
 default
 value, am I missing something here?


 I think what we might want to do is redefine the server's behavior
 as creating an event named after the concatenation of event_source
 and port number, or maybe even get rid of event_source entirely and
 just say it's PostgreSQL followed by the port number.

   To accomplish this behaviour, each time server starts and stops,
   we need to register and unregister event log using mechanism
   described at below link to ensure that there is no mismatch between
   what server uses and what OS knows.
   http://www.postgresql.org/docs/devel/static/event-log-registration.html

   IIUC, this will be a much larger behaviour change.
   What is the problem you are envisioning if we use version number,
   yesterday I have given some examples about other softwares which
   are registered in event log and uses version number, so I don't
   understand why it is big deal if we also use version number along with
   PostgreSQL as a default value and if user specifies any particular name
   then use the same.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-23 Thread Andrew Dunstan


On 01/22/2014 11:33 PM, KONDO Mitsumasa wrote:

(2014/01/23 12:00), Andrew Dunstan wrote:


On 01/22/2014 08:28 PM, KONDO Mitsumasa wrote:

(2014/01/22 22:26), Robert Haas wrote:

On Wed, Jan 22, 2014 at 3:32 AM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
OK, Kondo, please demonstrate benchmarks that show we have 1% 
impact

from this change. Otherwise we may need a config parameter to allow
the calculation.


OK, testing DBT-2 now. However, error range of benchmark might be 
1% higher.

So I show you detail HTML results.


To see any impact from spinlock contention, I think you're pretty much
going to need a machine with 32 cores, I think, and lots of
concurrency.  pgbench -S is probably a better test than DBT-2, because
it leaves out all the writing, so percentage-wise more time will be
spent doing things like updating the pgss hash table.
Oh, thanks to inform me. I think essential problem of my patch has 
bottle neck
in sqrt() function and other division caluculation. I will replcace 
sqrt()
function in math.h to more faster algorithm. And moving unneccessary 
part of
caluculation in LWlocks or other locks. It might take time to 
improvement, so

please wait for a while.



Umm, I have not read the patch, but are you not using Welford's 
method? Its
per-statement overhead should be absolutely tiny (and should not 
compute a square
root at all  per statement - the square root should only be computed 
when the

standard deviation is actually wanted, e.g. when a user examines
pg_stat_statements) See for example
http://www.johndcook.com/standard_deviation.html
Thanks for your advice. I read your example roughly, however, I think 
calculating variance is not so heavy in my patch. Double based sqrt 
caluculation is most heavily in my mind. And I find fast square root 
algorithm that is used in 3D games.

http://en.wikipedia.org/wiki/Fast_inverse_square_root

This page shows inverse square root algorithm, but it can caluculate 
normal square root, and it is faster algorithm at the price of 
precision than general algorithm. I think we want to fast algorithm, 
so it will be suitable.



According to the link I gave above:

   The most obvious way to compute variance then would be to have two
   sums: one to accumulate the sum of the x's and another to accumulate
   the sums of the squares of the x's. If the x's are large and the
   differences between them small, direct evaluation of the equation
   above would require computing a small number as the difference of
   two large numbers, a red flag for numerical computing. The loss of
   precision can be so bad that the expression above evaluates to a
   /negative/ number even though variance is always positive. 


As I read your patch that's what it seems to be doing.

What is more, if the square root calculation is affecting your 
benchmarks, I suspect you are benchmarking the wrong thing. The 
benchmarks should not call for a single square root calculation. What we 
really want to know is what is the overhead from keeping these stats. 
But your total runtime code (i.e. code NOT from calling 
pg_stat_statements()) for stddev appears to be this:


   e-counters.total_sqtime += total_time * total_time;


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit fest 2014-01 week 1 report

2014-01-23 Thread Andrew Dunstan


On 01/23/2014 08:11 AM, Peter Eisentraut wrote:

We started this commit fest with 103 patches and got off to a flyer,
with 20 patches committed during the first week.  Somehow we ended up
with 113 patches at the end of the first week, but let's please leave it
at that.

I'm tracking down patch authors who have not signed up for a review.
I'll also be following up with reviewers to send in their first review soon.

It looks like we could especially use a few reviewers for a number of
(small) Windows-specific changes.  If you have access to that, please
sign up.



I have picked four of these.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Andres Freund
On 2014-01-21 11:33:40 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-01-21 12:11:23 -0300, Alvaro Herrera wrote:
  How difficult would it be to have expand_fmt_string deal with positional
  modifiers?  I don't think we need anything from it other than the %n$
  notation, so perhaps it's not so problematic.
 
  I don't think there's much reason to go there. I didn't go for the
  pg-supplied sprintf() because I thought it'd be considered to
  invasive. Since that's apparently not the case...
 
 Yeah, that would make expand_fmt_string considerably more complicated
 and so presumably slower.  We don't really need that when we can make
 what I expect is a pretty trivial addition to snprintf.c.  Also, fixing
 snprintf.c will make it safe to use the z flag in contexts other than
 ereport/elog.

So, here's a patch adding %z support to port/snprintf.c including a
configure check to test whether we need to fall back. I am not too
happy about the runtime check as the test isn't all that meaningful, but
I couldn't think of anything better.

The second patch is a slightly updated version of a previously sent
version which is starting to use %z in some more places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 5c91e08cb2c4b3cc8779ed0cd89387eb38ea3690 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 23 Jan 2014 15:45:36 +0100
Subject: [PATCH 1/2] Add support for the %z modifier in error messages and
 other usages of printf.

The %z modifier allows to print size_t/Size variables without platform
dependent modifiers like UINT64_FORMAT and is included in C99 and
posix. As it's not present in all supported platforms add a configure
check which tests whether it's supported and actually working to some
degree and fall back to our existing snprintf.c fallback which now
supports the modifier if not.
---
 config/c-library.m4 | 38 ++
 configure   | 52 
 configure.in|  8 
 src/port/snprintf.c | 26 ++
 4 files changed, 124 insertions(+)

diff --git a/config/c-library.m4 b/config/c-library.m4
index 1e3997b..171d39c 100644
--- a/config/c-library.m4
+++ b/config/c-library.m4
@@ -303,6 +303,44 @@ int main()
 AC_MSG_RESULT([$pgac_cv_printf_arg_control])
 ])# PGAC_FUNC_PRINTF_ARG_CONTROL
 
+# PGAC_FUNC_PRINTF_SIZE_T_SUPPORT
+# ---
+# Determine if printf supports the z length modifier for printing
+# sizeof(size_t) sized variables. That's supported by C99 and posix but not
+# all platforms play ball, so we test whether it's working.
+# Useful for printing sizes in error messages et al. without up/downcasting
+# arguments.
+#
+AC_DEFUN([PGAC_FUNC_PRINTF_SIZE_T_SUPPORT],
+[AC_MSG_CHECKING([whether printf supports the %z modifier])
+AC_CACHE_VAL(pgac_cv_printf_size_t_support,
+[AC_TRY_RUN([#include stdio.h
+#include string.h
+
+int main()
+{
+  char buf64[100];
+  char bufz[100];
+
+  /*
+   * Check whether we print correctly using %z by printing the biggest
+   * unsigned number fitting in a size_t and using both %zu and the format for
+   * 64bit numbers.
+   */
+
+  snprintf(bufz, 100, %zu, ~(size_t)0);
+  snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0);
+  if (strcmp(bufz, buf64) != 0)
+return 1;
+  return 0;
+}],
+[pgac_cv_printf_size_t_support=yes],
+[pgac_cv_printf_size_t_support=no],
+[pgac_cv_printf_size_t_support=cross])
+])dnl AC_CACHE_VAL
+AC_MSG_RESULT([$pgac_cv_printf_size_t_support])
+])# PGAC_FUNC_PRINTF_SIZE_T_SUPPORT
+
 
 # PGAC_TYPE_LOCALE_T
 # --
diff --git a/configure b/configure
index e1ff704..d786b17 100755
--- a/configure
+++ b/configure
@@ -13036,6 +13036,58 @@ cat confdefs.h _ACEOF
 _ACEOF
 
 
+# Also force our snprintf if the system's doesn't support the %z modifier.
+if test $pgac_need_repl_snprintf = no; then
+  { $as_echo $as_me:${as_lineno-$LINENO}: checking whether printf supports the %z modifier 5
+$as_echo_n checking whether printf supports the %z modifier...  6; }
+if ${pgac_cv_printf_size_t_support+:} false; then :
+  $as_echo_n (cached)  6
+else
+  if test $cross_compiling = yes; then :
+  pgac_cv_printf_size_t_support=cross
+else
+  cat confdefs.h - _ACEOF conftest.$ac_ext
+/* end confdefs.h.  */
+#include stdio.h
+#include string.h
+
+int main()
+{
+  char buf64[100];
+  char bufz[100];
+
+  /*
+   * Check whether we print correctly using %z by printing the biggest
+   * unsigned number fitting in a size_t and using both %zu and the format for
+   * 64bit numbers.
+   */
+
+  snprintf(bufz, 100, %zu, ~(size_t)0);
+  snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0);
+  if (strcmp(bufz, buf64) != 0)
+return 1;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_run $LINENO; then :
+  pgac_cv_printf_size_t_support=yes
+else
+  pgac_cv_printf_size_t_support=no

Re: [HACKERS] dynamic shared memory and locks

2014-01-23 Thread Robert Haas
On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-22 12:40:34 -0500, Robert Haas wrote:
 On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
  breaking external code using lwlocks?
 
  +1, in fact there's probably no reason to touch most *internal* code using
  that type name either.

 I thought about this but figured it was too much of a misnomer to
 refer to a pointer as an ID.  But, if we're sure we want to go that
 route, I can go revise the patch along those lines.

 I personally don't care either way for internal code as long as external
 code continues to work. There's the argument of making the commit better
 readable by having less noise and less divergence in the branches and
 there's your argument of that being less clear.

OK, well then, if no one objects violently, I'll stick my current
approach of getting rid of all core mentions of LWLockId in favor of
LWLock *, but also add typedef LWLock *LWLockId with a comment that
this is to minimize breakage of third-party code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Passing direct args of ordered-set aggs to the transition function

2014-01-23 Thread Florian Pflug
Hi,

Is there a particular reason why the direct arguments of ordered-set
aggregates are not passed to the transition function too? It seems that
evaluating of some ordered-set aggregates would be much cheaper if we did
that.

For example, dense_rank() would then just need to count the number of rows
smaller than the hypothetical row, AFAICS.

Another example (that we don't currently provide, but still) would be a
histogram aggregate which receives an array of buckets as direct args and
returns a similarly shaped array of counters.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Andres Freund
On 2014-01-23 11:14:05 -0500, Tom Lane wrote:
 OK, I'll take a look.

Thanks.

  I am not too
  happy about the runtime check as the test isn't all that meaningful, but
  I couldn't think of anything better.
 
 Yeah, it's problematic for cross-compiles, but no more so than configure's
 existing test for %n$ support.  In practice, since both these features
 are required by C99, I think it wouldn't be such an issue for most people.

Currently we automatically fall back to our implementation if we're
cross compiling unless I am missing something, that's a bit odd, but it
should work ;)

I was wondering more about the nature of the runtime check than the fact
that it's a runtime check at all... E.g. snprintf.c simply skips over
unknown format characters and might not have been detected as faulty on
32bit platforms by that check. Which might be considered a good thing :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 So, here's a patch adding %z support to port/snprintf.c including a
 configure check to test whether we need to fall back.

OK, I'll take a look.

 I am not too
 happy about the runtime check as the test isn't all that meaningful, but
 I couldn't think of anything better.

Yeah, it's problematic for cross-compiles, but no more so than configure's
existing test for %n$ support.  In practice, since both these features
are required by C99, I think it wouldn't be such an issue for most people.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2014-01-23 Thread Heikki Linnakangas

On 01/14/2014 05:35 PM, Alexander Korotkov wrote:

Attached version is rebased against last version of packed posting lists.


Thanks!

I think we're missing a trick with multi-key queries. We know that when 
multiple scan keys are used, they are ANDed together, so we can do the 
skip optimization even without the new tri-state consistent function.


To get started, I propose the three attached patches. These only 
implement the optimization for the multi-key case, which doesn't require 
any changes to the consistent functions and hence no catalog changes. 
Admittedly this isn't anywhere near as useful in practice as the single 
key case, but let's go for the low-hanging fruit first. This 
nevertheless introduces some machinery that will be needed by the full 
patch anyway.


I structured the code somewhat differently than your patch. There is no 
separate fast-path for the case where the optimization applies. Instead, 
I'm passing the advancePast variable all the way down to where the next 
batch of items are loaded from the posting tree. keyGetItem is now 
responsible for advancing the entry streams, and the logic in 
scanGetItem has been refactored so that it advances advancePast 
aggressively, as soon as one of the key streams let us conclude that no 
items  a certain point can match.


scanGetItem might yet need to be refactored when we get to the full 
preconsistent check stuff, but one step at a time.



The first patch is the most interesting one, and contains the 
scanGetItem changes. The second patch allows seeking to the right 
segment in a posting tree page, and the third allows starting the 
posting tree scan from root, when skipping items (instead of just 
following the right-links).


Here are some simple performance test results, demonstrating the effect 
of each of these patches. This is a best-case scenario. I don't think 
these patches has any adverse effects even in the worst-case scenario, 
although I haven't actually tried hard to measure that. The used this to 
create a test table:


create table foo (intarr int[]);
-- Every row contains 0 (frequent term), and a unique number.
insert into foo select array[0,g] from generate_series(1, 1000) g;
-- Add another tuple with 0, 1 combo physically to the end of the table.
insert into foo values (array[0,1]);

The query I used is this:

postgres=# select count(*) from foo where intarr @ array[0] and intarr 
@ array[1];

 count
---
 2
(1 row)

I measured the time that query takes, and the number of pages hit, using 
explain (analyze, buffers true) 


patches time (ms)   buffers
---
unpatched   650 1316
patch 1 0.521316
patches 1+2 0.501316
patches 1+2+3   0.1315

So, the second patch isn't doing much in this particular case. But it's 
trivial, and I think it will make a difference in other queries where 
you have the opportunity skip, but return a lot of tuples overall.


In summary, these are fairly small patches, and useful on their, so I 
think these should be committed now. But please take a look and see if 
the logic in scanGetItem/keyGetItem looks correct to you. After this, I 
think the main fast scan logic will go into keyGetItem.


PS. I find it a bit surprising that in your patch, you're completely 
bailing out if there are any partial-match keys involved. Is there some 
fundamental reason for that, or just not implemented?


- Heikki
From 53e33c931c41f5ff8bb22ecfc011e717d2dbb9fd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Thu, 23 Jan 2014 15:41:43 +0200
Subject: [PATCH 1/3] Optimize GIN multi-key queries.

In a multi-key search, ie. something like col @ 'foo' AND col @ 'bar',
as soon as we find the next item that matches the first criteria, we don't
need to check the second criteria for TIDs smaller the first match. That saves
a lot of effort, especially if one of the first term is rare, while the
second occurs very frequently.

Based on ideas from Alexander Korotkov's fast scan patch
---
 src/backend/access/gin/ginget.c | 465 ++--
 1 file changed, 255 insertions(+), 210 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 4bdbd45..4de7a10 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -68,29 +68,6 @@ callConsistentFn(GinState *ginstate, GinScanKey key)
 }
 
 /*
- * Tries to refind previously taken ItemPointer on a posting page.
- */
-static bool
-needToStepRight(Page page, ItemPointer item)
-{
-	if (GinPageGetOpaque(page)-flags  GIN_DELETED)
-		/* page was deleted by concurrent vacuum */
-		return true;
-
-	if (ginCompareItemPointers(item, GinDataPageGetRightBound(page))  0
-			 !GinPageRightMost(page))
-	{
-		/*
-		 * the item we're looking is  the right bound of the page, so it
-		 * can't be on this page.
-		 */
-		return true;
-	}
-
-	return false;
-}
-
-/*
  * Goes to the next 

Re: [HACKERS] Passing direct args of ordered-set aggs to the transition function

2014-01-23 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 Is there a particular reason why the direct arguments of ordered-set
 aggregates are not passed to the transition function too?

Because they have to be evaluated only once.

I did consider evaluating them once at the start and saving the values,
but that's a bit problematic from a memory management standpoint.

Still, if you have a good solution to that and the cycles to produce a
patch, it's not too late to reconsider.  I concur that it's not that
hard to think of cases where it'd be useful.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I was wondering more about the nature of the runtime check than the fact
 that it's a runtime check at all... E.g. snprintf.c simply skips over
 unknown format characters and might not have been detected as faulty on
 32bit platforms by that check. Which might be considered a good thing :)

Oh ... gotcha.  Yeah, it's possible that snprintf would behave in a way
that masks the fact that it doesn't really recognize the z flag, but
that seems rather unlikely to me.  More likely it would abandon processing
the %-sequence on grounds it's malformed.

I will try the patch on my old HPUX dinosaur, which I'm pretty sure
does not know z, and verify this is the case.

Also, I'm guessing Windows' version of snprintf doesn't have z either.
Could someone try the patch's configure test program on Windows and see
what the result is?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Passing direct args of ordered-set aggs to the transition function

2014-01-23 Thread Florian Pflug
On Jan23, 2014, at 17:20 , Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 Is there a particular reason why the direct arguments of ordered-set
 aggregates are not passed to the transition function too?
 
 Because they have to be evaluated only once.
 
 I did consider evaluating them once at the start and saving the values,
 but that's a bit problematic from a memory management standpoint.

Yeah, that's what I had in mind. I not sure I understand that memory
management problems you mention though - couldn't we just copy them to
some appropriate memory context, say aggcontext?

What I'm more concerned about is whether it'd still be possible to have
ordered_set_transition and ordered_set_transition_multi work for all the
ordered-set aggregates we currently have. But I have yet to wrap my head
fully around the VARIADIC any ANY stuff that goes on there...

 Still, if you have a good solution to that and the cycles to produce a
 patch, it's not too late to reconsider.  I concur that it's not that
 hard to think of cases where it'd be useful.

I'll see what I can do.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-01-23 Thread Robert Haas
On Thu, Jan 23, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think shared buffers fsyncs are the apt comparison. It's more
 something like UpdateControlFile(). Which PANICs.

 I really don't get why you fight PANICs in general that much. There are
 some nasty PANICs in postgres which can happen in legitimate situations,
 which should be made to fail more gracefully, but this surely isn't one
 of them. We're doing rename(), unlink() and rmdir(). That's it.
 We should concentrate on the ones that legitimately can happen, not the
 ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or
 mount -o remount,ro /. We don't increase reliability by a bit adding
 codepaths that will never get tested.

Sorry, I don't buy it.  Lots of people I know have stories that go
like this $HORRIBLE happened, and PostgreSQL kept on running, and it
didn't even lose my data!, where $HORRIBLE may be variously that the
disk filled up, that disk writes started failing with I/O errors, that
somebody changed the permissions on the data directory inadvertently,
that the entire data directory got removed, and so on.  I've been
through some of those scenarios myself, and the care and effort that's
been put into failure modes has saved my bacon more than a few times,
too.  We *do* increase reliability by worrying about what will happen
even in code paths that very rarely get exercised.  It's certainly
true that our bug count there is higher there than for the parts of
our code that get exercised more regularly, but it's also lower than
it would be if we didn't make the effort, and the dividend that we get
from that effort is that we have a well-deserved reputation for
reliability.

I think it's completely unacceptable for the failure of routine
filesystem operations to result in a PANIC.  I grant you that we have
some existing cases where that can happen (like UpdateControlFile),
but that doesn't mean we should add more.  Right this very minute
there is massive bellyaching on a nearby thread caused by the fact
that a full disk condition while writing WAL can PANIC the server,
while on this thread at the very same time you're arguing that adding
more ways for a full disk to cause PANICs won't inconvenience anyone.
The other thread is right, and your argument here is wrong.  We have
been able to - and have taken the time to - fix comparable problems in
other cases, and we should do the same thing here.

As for why I fight PANICs so much in general, there are two reasons.
First, I believe that to be project policy.  I welcome correction if I
have misinterpreted our stance in that area.  Second, I have
encountered a few situations where customers had production servers
that repeatedly PANICked due to some bug or other.  If I've ever
encountered angrier customers, I can't remember when.  A PANIC is no
big deal when it happens on your development box, but when it happens
on a machine with 100 users connected to it, it's a big deal,
especially if a single underlying cause makes it happen over and over
again.

I think we should be devoting our time to figuring how to improve
this, not whether to improve it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Andres Freund
On 2014-01-23 11:25:56 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I was wondering more about the nature of the runtime check than the fact
  that it's a runtime check at all... E.g. snprintf.c simply skips over
  unknown format characters and might not have been detected as faulty on
  32bit platforms by that check. Which might be considered a good thing :)
 
 Oh ... gotcha.  Yeah, it's possible that snprintf would behave in a way
 that masks the fact that it doesn't really recognize the z flag, but
 that seems rather unlikely to me.  More likely it would abandon processing
 the %-sequence on grounds it's malformed.

Yea, hopefully.

 I will try the patch on my old HPUX dinosaur, which I'm pretty sure
 does not know z, and verify this is the case.

I don't know how, but I've introduced a typo in the version I sent if
you haven't noticed yet, there's a  missing in
PGAC_FUNC_PRINTF_SIZE_T_SUPPORT. %zu instead of %zu

 Also, I'm guessing Windows' version of snprintf doesn't have z either.
 Could someone try the patch's configure test program on Windows and see
 what the result is?

I've attached a version of that here, for $windowsperson's
convenience. I hope I got the llp stuff right...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
/* confdefs.h */
#define UINT64_FORMAT %llu
/* end confdefs.h.  */
#include stdio.h
#include string.h

int main()
{
  char buf64[100];
  char bufz[100];

  /*
   * Check whether we print correctly using %z by printing the biggest
   * unsigned number fitting in a size_t and using both %zu and the format for
   * 64bit numbers.
   */
  snprintf(bufz, 100, %zu, ~(size_t)0);
  snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0);
  if (strcmp(bufz, buf64) != 0)
  fprintf(stderr, no can do %%z\n);
  else
  fprintf(stderr, can do %%z\n);
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.0 (was logical changeset generation)

2014-01-23 Thread Andres Freund
On 2014-01-23 11:50:57 -0500, Robert Haas wrote:
 On Thu, Jan 23, 2014 at 7:05 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think shared buffers fsyncs are the apt comparison. It's more
  something like UpdateControlFile(). Which PANICs.
 
  I really don't get why you fight PANICs in general that much. There are
  some nasty PANICs in postgres which can happen in legitimate situations,
  which should be made to fail more gracefully, but this surely isn't one
  of them. We're doing rename(), unlink() and rmdir(). That's it.
  We should concentrate on the ones that legitimately can happen, not the
  ones created by an admin running a chmod -R 000 . ; rm -rf $PGDATA or
  mount -o remount,ro /. We don't increase reliability by a bit adding
  codepaths that will never get tested.
 
 Sorry, I don't buy it.  Lots of people I know have stories that go
 like this $HORRIBLE happened, and PostgreSQL kept on running, and it
 didn't even lose my data!, where $HORRIBLE may be variously that the
 disk filled up, that disk writes started failing with I/O errors, that
 somebody changed the permissions on the data directory inadvertently,
 that the entire data directory got removed, and so on.

Especially the not loosing data imo is because postgres is
conservative with continuing in situations it doesn't know anything
about. Most prominently the cluster wide restart after a segfault.

 I've been
 through some of those scenarios myself, and the care and effort that's
 been put into failure modes has saved my bacon more than a few times,
 too.  We *do* increase reliability by worrying about what will happen
 even in code paths that very rarely get exercised.

A part of thinking about them *is* restricting what happens in those
cases by keeping the possible states to worry about to a minimum.

Just splapping on an ERROR instead of PANIC can make things much
worse. Not releasing space until a restart, without a chance to do
anything about it because we failed to properly release the in-memory
slot will just make the problem bigger because now the cleanup might
take a week (VACUUM FULLing the entire cluster?).

 I think it's completely unacceptable for the failure of routine
 filesystem operations to result in a PANIC.  I grant you that we have
 some existing cases where that can happen (like UpdateControlFile),
 but that doesn't mean we should add more.  Right this very minute
 there is massive bellyaching on a nearby thread caused by the fact
 that a full disk condition while writing WAL can PANIC the server,
 while on this thread at the very same time you're arguing that adding
 more ways for a full disk to cause PANICs won't inconvenience anyone.

A full disk won't cause any of the problems for the case we're
discussing, will it? We're just doing rename(), unlink(), rmdir() here,
all should succeed while the FS is full (afair rename() does on all
common FSs because inodes are kept separately).

 The other thread is right, and your argument here is wrong.  We have
 been able to - and have taken the time to - fix comparable problems in
 other cases, and we should do the same thing here.

I don't think the WAL case is comparable at all. ENOSPC is something
expected that can happen during normal operation and doesn't include
malintended operator and is reasonably easy to test. unlink() or fsync()
randomly failing is not.
In fact, isn't the consequence out of that thread that we need a
significant amount of extra complexity to handle the case? We shouldn't
spend that effort for cases that don't deserve it because they are too
unlikely in practice.

And yes, there's not too many other places PANICing - because most can
rely on WAL handling those tricky cases for them...

 Second, I have
 encountered a few situations where customers had production servers
 that repeatedly PANICked due to some bug or other.  If I've ever
 encountered angrier customers, I can't remember when.  A PANIC is no
 big deal when it happens on your development box, but when it happens
 on a machine with 100 users connected to it, it's a big deal,
 especially if a single underlying cause makes it happen over and over
 again.

Sure. But blindly continuing and then, possibly quite a bit later,
loosing data, causing an outage that takes a long while to recover or
something isn't any better.

 I think we should be devoting our time to figuring how to improve
 this, not whether to improve it.

If you'd argue that creating a new slot should fail gracefull, ok, I can
relatively easily be convinced of that. But trying to handle failures in
the midst of deletion in cases that won't happen in reality is just
inviting trouble imo.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case sensitive mode in windows build option

2014-01-23 Thread Andrew Dunstan

On 01/13/2014 10:49 PM, Dilip kumar wrote:

 As per current behavior if user want to build in debug mode in
 windows, then he need to give debug in capital letters (DEBUG),

 I think many user will always make mistake in giving this option, in
 my opinion we can make it case insensitive.

 I have attached a small patch for the same ( just converted comparison
 to case insensitive).



I have committed this. It's in the commitfest as a bug fix, but I don't
think it's strictly a bug. OTOH, it's pretty harmless. Do we want to
backpatch it?

cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The problems of PQhost()

2014-01-23 Thread Fujii Masao
On Wed, Jan 22, 2014 at 11:48 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 I reported in other thread that PQhost() has three problems.

 http://www.postgresql.org/message-id/cahgqgwe77akyabywde5+8bjldopthp7cnswao_syedjogyv...@mail.gmail.com
 
 (1) PQhost() can return Unix-domain socket directory path even in the
 platform that
 doesn't support Unix-domain socket.

 (2) In the platform that doesn't support Unix-domain socket, when
 neither host nor hostaddr
 are specified, the default host 'localhost' is used to connect to
 the server and
 PQhost() must return that, but it doesn't.

 (3) PQhost() cannot return the hostaddr.

 As the result of these problems, you can see the incorrect result of
 \conninfo, for example.

 $ psql -d hostaddr=127.0.0.1
 =# \conninfo
 You are connected to database postgres as user postgres via socket
 in /tmp at port 5432.

 Obviously /tmp should be 127.0.0.1.
 

 Attached patch fixes these problems.

 We can fix the problem (3) by changing PQhost() so that it also
 returns the hostaddr. But this change might break the existing
 application using PQhost(). So, I added new libpq function PQhostaddr()
 which returns the hostaddr, and changed \conninfo so that it reports
 correct connection information by using both PQhost() and PQhostaddr().

 These problems exist in v9.3 or before. Basically we should backport
 the patch into those versions. But obviously it's too late to add new libpq
 function PQhostaddr() into those versions. Then, I'm thinking to backport
 only the change for (1) and (2) because we can fix them without adding
 new libpq function.

 BTW, I think that the patch also fixes the problem of \conninfo that
 MauMau reported in other thread.
 http://www.postgresql.org/message-id/8913B51B3B534F878D54B89E1EB964C6@maumau

Committed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] psql's \conninfo reports incorrect destination on Windows

2014-01-23 Thread Fujii Masao
On Wed, Dec 4, 2013 at 9:21 PM, MauMau maumau...@gmail.com wrote:
 Hello,

 I've found a bug that psql's \conninfo displays incorrect information on
 Windows.  Please find attached the patch and commit this.

 [Problem]
 When I run psql postgres on Windows and execute \conninfo, it outputs the
 text below.  It reports that psql connected to the server via UNIX domain
 socket, but the actual connection is of course via TCP/IP socket to
 localhost.

 You are connected to database postgres as user Administrator via socket
 in /tmp at port 5432.

 It should output:

 You are connected to database postgres as user Administrator on host
 localhost at port 5432.

I think that 77035fa8a92d8c39f4c689e54f46813f203f09a8 fixed this problem.
Please check that.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
  snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0);

Actually, that coding isn't gonna work at all on platforms where size_t
isn't the same size as uint64.  We could make it work by explicitly
casting the argument to whatever type we've decided to use as uint64
... but unless we want to include c.h here, that would require a lot of
extra cruft, and I'm really not sure it's gaining anything anyway.

I'm inclined to just print (size_t)0x and see if it produces
the expected result.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Andres Freund
On 2014-01-23 12:54:22 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
   snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0);
 
 Actually, that coding isn't gonna work at all on platforms where size_t
 isn't the same size as uint64.  We could make it work by explicitly
 casting the argument to whatever type we've decided to use as uint64
 ... but unless we want to include c.h here, that would require a lot of
 extra cruft, and I'm really not sure it's gaining anything anyway.

Hm, yea, it should be casted. I think we should have the type ready in
PG_INT64_TYPE, confdefs.h should contain it at that point.

 I'm inclined to just print (size_t)0x and see if it produces
 the expected result.

Well, the reasoning, weak as it may be, was that that we want to see
whether we successfully recognize z as a 64bit modifier or not. And I
couldn't think of a better way to test that both for 32 and 64 bit
platforms...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2014-01-23 Thread Alvaro Herrera
Pavel Stehule escribió:

 I though about it too. But I didn't continue - reasons was named by Dean -
 and RemoveObjects are not difficult code - lot of code is mechanical - and
 it is not on critical path.

I have pushed it after some editorialization.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Closing commitfest 2013-11

2014-01-23 Thread Alvaro Herrera
Alvaro Herrera wrote:
 With apologies to our beloved commitfest-mace-wielding CFM, commitfest
 2013-11 intentionally still contains a few open patches.  I think that
 CF is largely being ignored by most people now that we have CF 2014-01
 in progress.  If we don't want to do anything about these patches in the
 immediate future, I propose we move them to CF 2014-01.
 
 * shared memory message queues

I closed this one as committed.

 * Shave a few instructions from child-process startup sequence
 * Widening application of indices.

I marked these as returned with feedback.

 * fault tolerant DROP IF EXISTS

Committed this one and marked as such.

 * SSL: better default ciphersuite

This one was moved to 2014-01 (not by me).

So there is nothing remaining in 2013-11 and we can (continue to) focus
exclusively on 2014-01.  Yay!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-23 12:54:22 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 snprintf(buf64, 100, UINT64_FORMAT, ~(size_t)0);

 Actually, that coding isn't gonna work at all on platforms where size_t
 isn't the same size as uint64.  We could make it work by explicitly
 casting the argument to whatever type we've decided to use as uint64
 ... but unless we want to include c.h here, that would require a lot of
 extra cruft, and I'm really not sure it's gaining anything anyway.

 Hm, yea, it should be casted. I think we should have the type ready in
 PG_INT64_TYPE, confdefs.h should contain it at that point.

Ah, I'd forgotten that configure defined any such symbol.  Yeah, that
will work.

 Well, the reasoning, weak as it may be, was that that we want to see
 whether we successfully recognize z as a 64bit modifier or not.

I'm dubious that this is really adding much, but whatever.

I checked on my HPUX box and find that what it prints for %zu is
zu, confirming my thought that it'd just abandon processing of the
%-sequence.  (Interesting that it doesn't eat the z while doing
so, though.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Case sensitive mode in windows build option

2014-01-23 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I have committed this. It's in the commitfest as a bug fix, but I don't
 think it's strictly a bug. OTOH, it's pretty harmless. Do we want to
 backpatch it?

Given the lack of field complaints, I'd say it's not worth the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Warning in new GIN code

2014-01-23 Thread Tom Lane
The rather ancient gcc on my HPUX box is complaining thusly about HEAD:

ginbtree.c: In function `ginPlaceToPage':
ginbtree.c:602: warning: control reaches end of non-void function

I would imagine this would happen on any compiler that doesn't recognize
the hint about elog(ERROR) not returning.  Could we stick a dummy return
at the end?

}
else
+   {
elog(ERROR, unknown return code from GIN placeToPage method: 
%d, rc);
+   return false;  /* keep compiler quiet */
+   }
}

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-23 Thread Alvaro Herrera
Andres Freund escribió:

 * Why must we not schema qualify system types
   (c.f. expand_jsonval_typename)? It seems to be perfectly sensible to
   me to just use pg_catalog there.

So, the reason for doing things this way is to handle cases like
varchar(10) being turned into character varying; and that name
requires that the typename NOT be schema-qualified, otherwise it fails.
But thinking about this again, I don't see a reason why this can't be
returned simply as pg_catalog.varchar(10); this should work fine on the
receiving end as well, and give the same result.

The other cases I'm worried about are types like bit(1) vs. unadorned
bit vs. double-quoted bit, and char, etc.  I'm not sure I'm dealing
with them correctly right now.  So even if by the above paragraph I
could make the is_system thingy go away, I might still need it to cover
this case.

Thanks for the review, I will post an updated version later after fixing
the other issues you mentioned plus adding support for more commands.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

2014-01-23 Thread Stefan Kaltenbrunner
On 01/22/2014 06:28 PM, Heikki Linnakangas wrote:
 Compress GIN posting lists, for smaller index size.
 
 GIN posting lists are now encoded using varbyte-encoding, which allows them
 to fit in much smaller space than the straight ItemPointer array format used
 before. The new encoding is used for both the lists stored in-line in entry
 tree items, and in posting tree leaf pages.
 
 To maintain backwards-compatibility and keep pg_upgrade working, the code
 can still read old-style pages and tuples. Posting tree leaf pages in the
 new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
 format pages. Likewise, entry tree tuples in the new format have a
 GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.
 
 This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
 version 9.4 will therefore have version number 2 in the metapage, while old
 pg_upgraded indexes will have version 1. The code treats them the same, but
 it might be come handy in the future, if we want to drop support for the
 uncompressed format.
 
 Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.


it seems that this commit made spoonbill an unhappy animal:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04



Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Client-only installation on Windows

2014-01-23 Thread Andrew Dunstan

On 12/06/2013 09:16 AM, MauMau wrote:
 Hello,

 According to this page,

 http://www.postgresql.org/docs/current/static/install-procedure.html

 client-only installation is possible on UNIX/Linux like this:

 gmake -C src/bin install
 gmake -C src/include install
 gmake -C src/interfaces install
 gmake -C doc install

 With the attached patch, you can do client-only installation on
 Windows like this:

 install.bat install_dir client

 This installs:

 * client applications (both core and contrib)
 * DLLs for libpq and ECPG
 * header files
 * import libraries
 * pg_service.conf.sample and psqlrc.sample
 * symbol files (*.pdb) for the above modules

 If the second argument is given as client or omitted, all files are
 installed. With 9.4, the whole installation takes up about 80 MB, and
 the client-only installation takes up only 24 MB.

 Any comments would be appreciated


This looks OK, and I'll commit it after I have a chance to give it a
quick test (probably at the same time as I test the VS2013 patch)..

Is there any reason why pgbench is listed in @client_program_files as
well as @client_contribs? AFAICT it should only be in the latter.

cheers

andrew



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

2014-01-23 Thread Heikki Linnakangas

On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote:

On 01/22/2014 06:28 PM, Heikki Linnakangas wrote:

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.


it seems that this commit made spoonbill an unhappy animal:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04


Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will 
investigate..


- Heikki

--
- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
I wrote:
 I checked on my HPUX box and find that what it prints for %zu is
 zu, confirming my thought that it'd just abandon processing of the
 %-sequence.  (Interesting that it doesn't eat the z while doing
 so, though.)

Further testing on that box shows that its ancient gcc (2.95.3) doesn't
know z either, which means that the patch produces a boatload of
compile warnings like this:

mcxt.c: In function `MemoryContextAllocZero':
mcxt.c:605: warning: unknown conversion type character `z' in format
mcxt.c:605: warning: too many arguments for format

While I am not really expecting this gcc to compile PG cleanly anymore,
the idea that we might get many dozen such warnings on more-current
compilers is scarier, as that might well interfere with people's
ability to do development on, say, Windows.  Could somebody check
whether MSVC for instance complains about format strings using z?
Or shall I just commit this and we'll see what the buildfarm says?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Warning in new GIN code

2014-01-23 Thread Heikki Linnakangas

On 01/23/2014 08:41 PM, Tom Lane wrote:

The rather ancient gcc on my HPUX box is complaining thusly about HEAD:

ginbtree.c: In function `ginPlaceToPage':
ginbtree.c:602: warning: control reaches end of non-void function

I would imagine this would happen on any compiler that doesn't recognize
the hint about elog(ERROR) not returning.  Could we stick a dummy return
at the end?

}
else
+   {
elog(ERROR, unknown return code from GIN placeToPage method: 
%d, rc);
+   return false;  /* keep compiler quiet */
+   }
}


Fixed, thanks.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-01-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 So, the reason for doing things this way is to handle cases like
 varchar(10) being turned into character varying; and that name
 requires that the typename NOT be schema-qualified, otherwise it fails.
 But thinking about this again, I don't see a reason why this can't be
 returned simply as pg_catalog.varchar(10); this should work fine on the
 receiving end as well, and give the same result.

I think people would be unhappy if we changed the output of, say, pg_dump
that way.  But it's presumably not a problem for strings inside event
triggers.  Once upon a time, the typmods would have been an issue, but now
that we support them in generic typename syntax I think we're good.

 The other cases I'm worried about are types like bit(1) vs. unadorned
 bit vs. double-quoted bit, and char, etc.  I'm not sure I'm dealing
 with them correctly right now.  So even if by the above paragraph I
 could make the is_system thingy go away, I might still need it to cover
 this case.

Yeah, there are some weird cases there.  I think you can make them go away
if you always print the fully qualified type name, ie don't omit
pg_catalog, and use pg_type.typname not any converted version (and don't
forget to double-quote anything that might be a reserved word).
But I've not looked closely at the code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Joshua D. Drake


Hello,

I have run into yet again another situation where there was an 
assumption that autovacuum was keeping up and it wasn't. It was caused 
by autovacuum quitting because another process requested a lock.


In turn we received a ton of bloat on pg_attribute which caused all 
kinds of other issues (as can be expected).


The more I run into it, the more it seems like autovacuum should behave 
like vacuum, in that it gets precedence when it is running. First come, 
first serve as they say.


Thoughts?

JD
--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

2014-01-23 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote:
 it seems that this commit made spoonbill an unhappy animal:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04

 Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will 
 investigate..

My HPUX box, which is also picky about alignment, is unhappy as well.
It's crashing here:

ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
at ginpostinglist.c:263
263 return ginPostingListDecodeAllSegments(plist,
(gdb) bt
#0  ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
at ginpostinglist.c:263
#1  0x205308 in ginReadTuple (ginstate=0xc39efac1, attnum=48864, 
itup=0x7b03bee0, nitems=0x403ee9a4) at ginentrypage.c:170
#2  0x21074c in startScanEntry (ginstate=0x403ec3ac, entry=0x403ee970)
at ginget.c:463
#3  0x21086c in startScan (scan=0xc39efac1) at ginget.c:493
#4  0x212c14 in gingetbitmap (fcinfo=0xc39efac1) at ginget.c:1531
#5  0x5ffc50 in FunctionCall2Coll (flinfo=0xc39efac1, collation=2063843040, 
arg1=2063843040, arg2=1077864868) at fmgr.c:1323
#6  0x24ee5c in index_getbitmap (scan=0x40163878, bitmap=0x403ee620)
at indexam.c:649
#7  0x3b9430 in MultiExecBitmapIndexScan (node=0x40163768)
at nodeBitmapIndexscan.c:89
#8  0x3a5a3c in MultiExecProcNode (node=0x40163768) at execProcnode.c:562
#9  0x3b8610 in BitmapHeapNext (node=0x401628f0) at nodeBitmapHeapscan.c:104
#10 0x3ae5b0 in ExecScan (node=0x401628f0, 
accessMtd=0x4001a2c2 DINFINITY+3802, 
recheckMtd=0x4001a2ca DINFINITY+3810) at execScan.c:82
#11 0x3b8e9c in ExecBitmapHeapScan (node=0xc39efac1)
at nodeBitmapHeapscan.c:441
#12 0x3a56e0 in ExecProcNode (node=0x401628f0) at execProcnode.c:414
...

(gdb) p debug_query_string
$1 = 0x4006d4a8 SELECT * FROM array_index_op_test WHERE i @ '{38,34,32,89}' 
ORDER BY seqno;

The problem appears to be due to the misaligned plist pointer
(0xc39efac1 here).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Tom Lane
Joshua D. Drake j...@commandprompt.com writes:
 I have run into yet again another situation where there was an 
 assumption that autovacuum was keeping up and it wasn't. It was caused 
 by autovacuum quitting because another process requested a lock.

 In turn we received a ton of bloat on pg_attribute which caused all 
 kinds of other issues (as can be expected).

 The more I run into it, the more it seems like autovacuum should behave 
 like vacuum, in that it gets precedence when it is running. First come, 
 first serve as they say.

1. Back when it worked like that, things were worse.

2. What have you got that is requesting exclusive lock on pg_attribute?
That seems like a pretty unfriendly behavior in itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Josh Berkus
On 01/23/2014 12:34 PM, Joshua D. Drake wrote:
 
 Hello,
 
 I have run into yet again another situation where there was an
 assumption that autovacuum was keeping up and it wasn't. It was caused
 by autovacuum quitting because another process requested a lock.
 
 In turn we received a ton of bloat on pg_attribute which caused all
 kinds of other issues (as can be expected).
 
 The more I run into it, the more it seems like autovacuum should behave
 like vacuum, in that it gets precedence when it is running. First come,
 first serve as they say.
 
 Thoughts?

If we let autovacuum block user activity, a lot more people would turn
it off.

Now, if you were to argue that we should have some way to monitor the
tables which autovac can never touch because of conflicts, I would agree
with you.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Harold Giménez
On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/23/2014 12:34 PM, Joshua D. Drake wrote:

 Hello,

 I have run into yet again another situation where there was an
 assumption that autovacuum was keeping up and it wasn't. It was caused
 by autovacuum quitting because another process requested a lock.

 In turn we received a ton of bloat on pg_attribute which caused all
 kinds of other issues (as can be expected).

 The more I run into it, the more it seems like autovacuum should behave
 like vacuum, in that it gets precedence when it is running. First come,
 first serve as they say.

 Thoughts?

 If we let autovacuum block user activity, a lot more people would turn
 it off.

 Now, if you were to argue that we should have some way to monitor the
 tables which autovac can never touch because of conflicts, I would agree
 with you.

Agree completely. Easy ways to monitor this would be great. Once you
know there's a problem, tweaking autovacuum settings is very hard and
misunderstood, and explaining how to be effective at it is a dark art
too.

-Harold


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Mark Kirkwood

On 24/01/14 09:49, Tom Lane wrote:
2. What have you got that is requesting exclusive lock on 
pg_attribute? That seems like a pretty unfriendly behavior in itself. 
regards, tom lane 


I've seen this sort of problem where every db session was busily 
creating temporary tables. I never got to the find *why* they needed to 
make so many, but it seemed like a bad idea.


Regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.1

2014-01-23 Thread Robert Haas
Patch 0001:

+errmsg(could not find free replication slot),

Suggest: all replication slots are in use

+   elog(ERROR, cannot aquire a slot while another slot
has been acquired);

Suggest: this backend has already acquired a replication slot

Or demote it to Assert().  I'm not really sure why this needs to be
checked in non-assert builds.  I also wonder if we should use the
terminology attach instead of acquire; that pairs more naturally
with release.  Then the message, if we want more than an assert,
might be this backend is already attached to a replication slot.

+   if (slot == NULL)
+   {
+   LWLockRelease(ReplicationSlotCtlLock);
+   ereport(ERROR,
+   (errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg(could not find replication
slot \%s\, name)));
+   }

The error will release the LWLock anyway; I'd get rid of the manual
LWLockRelease, and the braces.  Similarly in ReplicationSlotDrop.

+   /* acquire spinlock so we can test and set -active safely */
+   SpinLockAcquire(slot-mutex);
+
+   if (slot-active)
+   {
+   SpinLockRelease(slot-mutex);
+   LWLockRelease(ReplicationSlotCtlLock);
+   ereport(ERROR,
+   (errcode(ERRCODE_OBJECT_IN_USE),
+errmsg(slot \%s\ already active, name)));
+   }
+
+   /* we definitely have the slot, no errors possible anymore */
+   slot-active = true;
+   MyReplicationSlot = slot;
+   SpinLockRelease(slot-mutex);

This doesn't need the LWLockRelease either.  It does need the
SpinLockRelease, but as I think I noted previously, the right way to
write this is probably: SpinLockAcquire(slot-mutex); was_active =
slot-active; slot-active = true; SpinLockRelease(slot-mutex); if
(was_active) ereport(). MyReplicatonSlot = slot.

ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and
the comment Provide interlock against concurrent recomputations
doesn't seem adequate to me.  I guess the idea here is that we regard
ProcArrayLock as protecting ReplicationSlotCtl-catalog_xmin and
ReplicationSlotCtl-data_xmin, but if that's the idea then we only
need to hold the lock during the time when we actually update those
values, not the loop where we compute them.  Also, if that's the
design, maybe they should be part of PROC_HDR *ProcGlobal rather than
here.  It seems weird to have some of the values protected by
ProcArrayLock live in a completely different data structure managed
almost entirely by some other part of the system.

It's pretty evident that what's currently patch #4 (only peg the xmin
horizon for catalog tables during logical decoding) needs to become
patch #1, because it doesn't make any sense to apply this before we do
that.  I'm still not 100% confident in that approach, but maybe I'd
better try to look at it RSN and get confident, because too much of
the rest of what you've got here hangs on that to proceed without it.
Or to put all that another way, if for any reason we decide that the
separate catalog xmin stuff is not viable, the rest of this is going
to need a lot of rework, so we'd better sort that now rather than
later.

With respect to the synchronize-slots-to-disk stuff we're arguing
about on the other thread, I think the basic design problem here is
that you assume that you can change stuff in memory and then change
stuff on disk, without either set of changes being atomic.  What I
think you need to do is making atomic actions on disk correspond to
atomic state changes in memory.  IOW, suppose that creating a slot
consists of two steps: mkdir() + fsync().  Then I think what you need
to do is - do the mkdir().  If it errors out, fine.  If it succeeds,
the mark the slot half-created.  This is just an assignment so it can
done immediately after you learn that mkdir() worked with no risk of
an intervening failure.  Then, try to fsync().  If it errors out, the
slot will get left in the half-created state.  If it works, then
immediately mark the slot as fully created.  Now, when the next guy
comes along and looks at the slot, he can tell what he needs to do.
Specifically, if the slot is half-created, and he wants to do anything
other than remove it, he's got to fsync() it first, and if that errors
out, so be it.  The next access to the slot will merely find it still
half-created and simply try the fsync() yet again.

Alternatively, since nearly everything we're trying to do here is a
two-step operation - do something and then fsync - maybe we have a
more generic fsync-pending flag, and each slot operation checks that
and retries the fsync() if it's set.  But it might be situation
dependent which thing we need to fsync, since there are multiple files
involved.

Broadly, what you're trying to accomplish here is to have something
that is crash-safe but without relying on WAL, so that it can work on

Re: [HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

2014-01-23 Thread Heikki Linnakangas

On 01/23/2014 10:37 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote:

it seems that this commit made spoonbill an unhappy animal:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2014-01-23%2000%3A00%3A04



Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will
investigate..


My HPUX box, which is also picky about alignment, is unhappy as well.
It's crashing here:

ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
 at ginpostinglist.c:263
263 return ginPostingListDecodeAllSegments(plist,
(gdb) bt
#0  ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
 at ginpostinglist.c:263
#1  0x205308 in ginReadTuple (ginstate=0xc39efac1, attnum=48864,
 itup=0x7b03bee0, nitems=0x403ee9a4) at ginentrypage.c:170
#2  0x21074c in startScanEntry (ginstate=0x403ec3ac, entry=0x403ee970)
 at ginget.c:463
#3  0x21086c in startScan (scan=0xc39efac1) at ginget.c:493
#4  0x212c14 in gingetbitmap (fcinfo=0xc39efac1) at ginget.c:1531
#5  0x5ffc50 in FunctionCall2Coll (flinfo=0xc39efac1, collation=2063843040,
 arg1=2063843040, arg2=1077864868) at fmgr.c:1323
#6  0x24ee5c in index_getbitmap (scan=0x40163878, bitmap=0x403ee620)
 at indexam.c:649
#7  0x3b9430 in MultiExecBitmapIndexScan (node=0x40163768)
 at nodeBitmapIndexscan.c:89
#8  0x3a5a3c in MultiExecProcNode (node=0x40163768) at execProcnode.c:562
#9  0x3b8610 in BitmapHeapNext (node=0x401628f0) at nodeBitmapHeapscan.c:104
#10 0x3ae5b0 in ExecScan (node=0x401628f0,
 accessMtd=0x4001a2c2 DINFINITY+3802,
 recheckMtd=0x4001a2ca DINFINITY+3810) at execScan.c:82
#11 0x3b8e9c in ExecBitmapHeapScan (node=0xc39efac1)
 at nodeBitmapHeapscan.c:441
#12 0x3a56e0 in ExecProcNode (node=0x401628f0) at execProcnode.c:414
...

(gdb) p debug_query_string
$1 = 0x4006d4a8 SELECT * FROM array_index_op_test WHERE i @ '{38,34,32,89}' ORDER 
BY seqno;

The problem appears to be due to the misaligned plist pointer
(0xc39efac1 here).


Ah, thanks! Looks like I removed a SHORTALIGN from ginFormTuple that was 
in fact very much necessary.. Fixed now, let's see if that pacifies the 
sparcs.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Robert Haas
On Thu, Jan 23, 2014 at 9:23 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 On Wed, Jan 22, 2014 at 9:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So?  Anything which can know the value of a GUC parameter can certainly
 know the selected port number.

 1. In case of registration of event source, either user has to pass the name
 or it uses hard coded default value, so if we use version number along 
 with
 'PostgreSQL', it can be consistent.
 I don't see any way pgevent.c can know port number to append it to 
 default
 value, am I missing something here?


 I think what we might want to do is redefine the server's behavior
 as creating an event named after the concatenation of event_source
 and port number, or maybe even get rid of event_source entirely and
 just say it's PostgreSQL followed by the port number.

To accomplish this behaviour, each time server starts and stops,
we need to register and unregister event log using mechanism
described at below link to ensure that there is no mismatch between
what server uses and what OS knows.
http://www.postgresql.org/docs/devel/static/event-log-registration.html

Why wouldn't that be necessary with your approach, too?  I mean, if
there's a GUC that controls the event source name, then it can be
changed between restarts, regardless of what you call it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Robert Haas
On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:
 On 24/01/14 09:49, Tom Lane wrote:
 2. What have you got that is requesting exclusive lock on pg_attribute?
 That seems like a pretty unfriendly behavior in itself. regards, tom lane

 I've seen this sort of problem where every db session was busily creating
 temporary tables. I never got to the find *why* they needed to make so many,
 but it seemed like a bad idea.

But... how does that result on a vacuum-incompatible lock request
against pg_attribute?

I see that it'll insert lots of rows into pg_attribute, and maybe
later delete them, but none of that blocks vacuum.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-23 Thread Gregory Smith

On 1/20/14 9:46 AM, Mel Gorman wrote:
They could potentially be used to evalate any IO scheduler changes. 
For example -- deadline scheduler with these parameters has X 
transactions/sec throughput with average latency of Y millieseconds 
and a maximum fsync latency of Z seconds. Evaluate how well the 
out-of-box behaviour compares against it with and without some set of 
patches. At the very least it would be useful for tracking historical 
kernel performance over time and bisecting any regressions that got 
introduced. Once we have a test I think many kernel developers (me at 
least) can run automated bisections once a test case exists. 


That's the long term goal.  What we used to get out of pgbench were 
things like 60 second latencies when a checkpoint hit with GBs of dirty 
memory.  That does happen in the real world, but that's not a realistic 
case you can tune for very well.  In fact, tuning for it can easily 
degrade performance on more realistic workloads.


The main complexity I don't have a clear view of yet is how much 
unavoidable storage level latency there is in all of the common 
deployment types.  For example, I can take a server with a 256MB 
battery-backed write cache and set dirty_background_bytes to be smaller 
than that.  So checkpoint spikes go away, right?  No. Eventually you 
will see dirty_background_bytes of data going into an already full 256MB 
cache.  And when that happens, the latency will be based on how long it 
takes to write the cached 256MB out to the disks.  If you have a single 
disk or RAID-1 pair, that random I/O could easily happen at 5MB/s or 
less, and that makes for a 51 second cache clearing time.  This is a lot 
better now than it used to be because fsync hasn't flushed the whole 
cache in many years now. (Only RHEL5 systems still in the field suffer 
much from that era of code)  But you do need to look at the distribution 
of latency a bit because of how the cache impact things, you can't just 
consider min/max values.


Take the BBWC out of the equation, and you'll see latency proportional 
to how long it takes to clear the disk's cache out. It's fun upgrading 
from a disk with 32MB of cache to 64MB only to watch worst case latency 
double.  At least the kernel does the right thing now, using that cache 
when it can while forcing data out when fsync calls arrive.  (That's 
another important kernel optimization we'll never be able to teach the 
database)


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Joshua D. Drake


On 01/23/2014 01:03 PM, Mark Kirkwood wrote:


On 24/01/14 09:49, Tom Lane wrote:

2. What have you got that is requesting exclusive lock on
pg_attribute? That seems like a pretty unfriendly behavior in itself.
regards, tom lane


I've seen this sort of problem where every db session was busily
creating temporary tables. I never got to the find *why* they needed to
make so many, but it seemed like a bad idea.


Yep... that's the one. They are creating lots and lots of temp tables.

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Pavel Stehule
Dne 23.1.2014 22:04 Mark Kirkwood mark.kirkw...@catalyst.net.nz
napsal(a):

 On 24/01/14 09:49, Tom Lane wrote:

 2. What have you got that is requesting exclusive lock on pg_attribute?
That seems like a pretty unfriendly behavior in itself. regards, tom lane


 I've seen this sort of problem where every db session was busily creating
temporary tables. I never got to the find *why* they needed to make so
many, but it seemed like a bad idea.


Our customer had same problem with  temp tables by intensively plpgsql
functions. For higher load a temp tables are performance and stability
killer. Vacuum of pg attrib has very ugly impacts :(

Regars

Pavel

After redesign - without tmp tables - his applications works well.

We needs a global temp tables

 Regards

 Mark




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Tom Lane
Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:
 On 24/01/14 09:49, Tom Lane wrote:
 2. What have you got that is requesting exclusive lock on 
 pg_attribute? That seems like a pretty unfriendly behavior in itself. 

 I've seen this sort of problem where every db session was busily 
 creating temporary tables. I never got to the find *why* they needed to 
 make so many, but it seemed like a bad idea.

That shouldn't result in any table-level exclusive locks on system
catalogs, though.

[ thinks... ]  It's possible that what you saw is not the
kick-out-autovacuum-entirely behavior, but the behavior added in commit
bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it
can't immediately get an exclusive buffer lock on.  On a heavily used
table, we might skip the same page repeatedly, so that dead tuples don't
get cleaned for a long time.

To add insult to injury, despite having done that, vacuum would reset the
pgstats dead-tuple count to zero, thus postponing the next autovacuum.
I think commit 115f41412 may have improved the situation, but I'd want
to see some testing of this theory before I'd propose back-patching it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Mark Kirkwood

On 24/01/14 10:09, Robert Haas wrote:

On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

On 24/01/14 09:49, Tom Lane wrote:

2. What have you got that is requesting exclusive lock on pg_attribute?
That seems like a pretty unfriendly behavior in itself. regards, tom lane

I've seen this sort of problem where every db session was busily creating
temporary tables. I never got to the find *why* they needed to make so many,
but it seemed like a bad idea.

But... how does that result on a vacuum-incompatible lock request
against pg_attribute?

I see that it'll insert lots of rows into pg_attribute, and maybe
later delete them, but none of that blocks vacuum.



That was my thought too - if I see it happening again here (was a year 
or so ago that I saw some serious pg_attribute bloat) I'll dig deeper.


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread MauMau

From: Amit Kapila amit.kapil...@gmail.com

How about below message:

event source event_source_name is already registered.


OK, I added several lines for this.  Please check the attached patch.



What I had in mind was to change it during initdb, we are already doing it
for some other parameter (unix_socket_directories), please refer below
code in initdb.c


Yes, It seems we can do this.  However, could you forgive me for leaving 
this untouched?  I'm afraid postgresql.conf.sample's issue is causing 
unnecessary war among people here.  That doesn't affect the point of this 
patch --- make pg_ctl use the event_source setting.  Anyway, not all 
parameters in postgresql.conf cannot be used just by uncommenting them. 
That's another issue.


I'll update the CommitFest entry soon.

Regards
MauMau


pg_ctl_eventsrc_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Peter Eisentraut
On 1/23/14, 4:08 PM, Robert Haas wrote:
 Why wouldn't that be necessary with your approach, too?  I mean, if
 there's a GUC that controls the event source name, then it can be
 changed between restarts, regardless of what you call it.

I don't know if it's practical, but the logical conclusion here would be
to use an identifier that you cannot change, such as the system identifier.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 1/23/14, 4:08 PM, Robert Haas wrote:
 Why wouldn't that be necessary with your approach, too?  I mean, if
 there's a GUC that controls the event source name, then it can be
 changed between restarts, regardless of what you call it.

 I don't know if it's practical, but the logical conclusion here would be
 to use an identifier that you cannot change, such as the system identifier.

That particular ID would be a horrid choice, because we don't try very
hard to ensure it's unique.  In particular, a standby server on the same
machine as the master (not an uncommon case, at least for testing
purposes) would be a guaranteed fail with that approach.

I'm still not clear on why we can't just use the port number.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Alvaro Herrera
Tom Lane escribió:
 Peter Eisentraut pete...@gmx.net writes:
  On 1/23/14, 4:08 PM, Robert Haas wrote:
  Why wouldn't that be necessary with your approach, too?  I mean, if
  there's a GUC that controls the event source name, then it can be
  changed between restarts, regardless of what you call it.
 
  I don't know if it's practical, but the logical conclusion here would be
  to use an identifier that you cannot change, such as the system identifier.
 
 That particular ID would be a horrid choice, because we don't try very
 hard to ensure it's unique.  In particular, a standby server on the same
 machine as the master (not an uncommon case, at least for testing
 purposes) would be a guaranteed fail with that approach.
 
 I'm still not clear on why we can't just use the port number.

I wonder if it would make sense to generate a unique name at some
initial point in the history of the service (perhaps at initdb time, or
at the first postmaster start) and store this name in a special,
separate file in PGDATA.  On subsequent starts we read the name from
there and always use it consistently.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-23 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 9:23 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Ok, here's a new version of the patch to handle incomplete B-tree splits.

I finally got around to taking a look at this. Unlike with the as-yet
uncommitted Race condition in b-tree page deletion patch that Kevin
looked at, which there is a dependency on here, I could not really
consider your patch in isolation. I'm really reviewing both patches,
but with a particular focus on this recent set of additions
(btree-incomplete-splits-2.patch), and the issues addressed by it.
However, since the distinction between this patch and the patch that
it has a dependency on is fuzzy, expect me to be fuzzy in
differentiating the two. This patch is only very loosely an atomic
unit. This is not a criticism - I understand that it just turned out
that way.

The first thing I noticed about this patchset is that it completely
expunges btree_xlog_startup(), btree_xlog_cleanup() and
btree_safe_restartpoint(). The post-recovery cleanup that previously
occurred to address both sets of problems (the problem addressed by
this patch, incomplete page splits, and the problem addressed by the
dependency patch, incomplete page deletions) now both occur
opportunistically/lazily. Notably, this means that there are now
exactly zero entries in the resource manager list with a
'restartpoint' callback. I guess we should consider removing it
entirely for that reason. As you said in the commit message where
gin_safe_restartpoint was similarly retired (commit 631118fe):


The post-recovery cleanup mechanism was never totally reliable, as insertion
to the parent could fail e.g because of running out of memory or disk space,
leaving the tree in an inconsistent state.


I'm of the general impression that post-recovery cleanup is
questionable. I'm surprised that you didn't mention this commit
previously. You just mentioned the original analogous work on GiST,
but this certainly seems to be something you've been thinking about
*systematically* eliminating for a while.

So while post-recovery callbacks no longer exist for any
rmgr-managed-resource, 100% of remaining startup and cleanup callbacks
concern the simple management of memory of AM-specific recovery
contexts (for GiST, GiN and SP-GiST). I have to wonder if there isn't
a better abstraction than that, such as a generic recovery memory
context, allowing you to retire all 3 callbacks. I mean, StartupXLOG()
just calls those callbacks for each resource at exactly the same time
anyway, just as it initializes resource managers in precisely the same
manner earlier on. Plus if you look at what those AM-local memory
management routines do, it all seems very simple.

I think you should bump XLOG_PAGE_MAGIC.

Perhaps you should increase the elevel here:

+   elog(DEBUG1, finishing incomplete split of %u/%u,
+BufferGetBlockNumber(lbuf), BufferGetBlockNumber(rbuf));

Since this is a very rare occurrence that involves some subtle
interactions, I can't think why you wouldn't want to LOG this for
forensic purposes.

Why did you remove the local linkage of _bt_walk_left(), given that it
is called in exactly the same way as before? I guess that that is just
a vestige of some earlier revision.

I think I see some bugs in _bt_moveright(). If you examine
_bt_finish_split() in detail, you'll see that it doesn't just drop the
write buffer lock that the caller will have provided (per its
comments) - it also drops the buffer pin. It calls _bt_insert_parent()
last, which was previously only called last thing by _bt_insertonpg()
(some of the time), and _bt_insertonpg() is indeed documented to drop
both the lock and the pin. And if you look at _bt_moveright() again,
you'll see that there is a tacit assumption that the buffer pin isn't
dropped, or at least that opaque (the BTPageOpaque BT special page
area pointer) continues to point to the same page held in the same
buffer, even though the code doesn't set the opaque' pointer again
and it may not point to a pinned buffer or even the appropriate
buffer. Ditto page. So opaque may point to the wrong thing on
subsequent iterations - you don't do anything with the return value of
_bt_getbuf(), unlike all of the existing call sites. I believe you
should store its return value, and get the held page and the special
pointer on the page, and assign that to the opaque pointer before
the next iteration (an iteration in which you very probably really do
make progress not limited to completing a page split, the occurrence
of which the _bt_moveright() loop gets stuck on). You know, do what
is done in the non-split-handling case. It may not always be the same
buffer as before, obviously.

For a moment I thought that you might have accounted for that here:

*** _bt_insert_parent(Relation rel,
*** 1685,1696 
* 05/27/97
*/
   ItemPointerSet((stack-bts_btentry.t_tid), bknum, P_HIKEY);
-
   pbuf = _bt_getstackbuf(rel, 

Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Mark Kirkwood

On 24/01/14 10:16, Mark Kirkwood wrote:

On 24/01/14 10:09, Robert Haas wrote:

On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:

On 24/01/14 09:49, Tom Lane wrote:
2. What have you got that is requesting exclusive lock on 
pg_attribute?
That seems like a pretty unfriendly behavior in itself. regards, 
tom lane
I've seen this sort of problem where every db session was busily 
creating
temporary tables. I never got to the find *why* they needed to make 
so many,

but it seemed like a bad idea.

But... how does that result on a vacuum-incompatible lock request
against pg_attribute?

I see that it'll insert lots of rows into pg_attribute, and maybe
later delete them, but none of that blocks vacuum.



That was my thought too - if I see it happening again here (was a year 
or so ago that I saw some serious pg_attribute bloat) I'll dig deeper.





Actually not much digging required. Running the attached script via 
pgbench (8 sessions) against a default configured postgres 8.4 sees 
pg_attribute get to 1G after about 15 minutes.


BEGIN;
DROP TABLE IF EXISTS tab0;
CREATE TEMP TABLE tab0 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab0  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab1;
CREATE TEMP TABLE tab1 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab1  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab2;
CREATE TEMP TABLE tab2 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab2  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab3;
CREATE TEMP TABLE tab3 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab3  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab4;
CREATE TEMP TABLE tab4 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab4  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab5;
CREATE TEMP TABLE tab5 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab5  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab6;
CREATE TEMP TABLE tab6 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab6  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab7;
CREATE TEMP TABLE tab7 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab7  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab8;
CREATE TEMP TABLE tab8 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab8  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab9;
CREATE TEMP TABLE tab9 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab9  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab10;
CREATE TEMP TABLE tab10 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab10  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab11;
CREATE TEMP TABLE tab11 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab11  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab12;
CREATE TEMP TABLE tab12 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab12  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab13;
CREATE TEMP TABLE tab13 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab13  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab14;
CREATE TEMP TABLE tab14 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab14  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab15;
CREATE TEMP TABLE tab15 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab15  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab16;
CREATE TEMP TABLE tab16 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab16  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab17;
CREATE TEMP TABLE tab17 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab17  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab18;
CREATE TEMP TABLE tab18 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab18  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab19;
CREATE TEMP TABLE tab19 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab19  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab20;
CREATE TEMP TABLE tab20 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab20  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab21;
CREATE TEMP TABLE tab21 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab21  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab22;
CREATE TEMP TABLE tab22 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab22  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab23;
CREATE TEMP TABLE tab23 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab23  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab24;
CREATE TEMP TABLE tab24 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab24  SELECT generate_series(1,1000),'xx';
DROP TABLE IF EXISTS tab25;
CREATE TEMP TABLE tab25 ( id INTEGER PRIMARY KEY, val TEXT);
INSERT INTO tab25  SELECT 

Re: [HACKERS] Changeset Extraction v7.1

2014-01-23 Thread Alvaro Herrera
 I wonder if it wouldn't be better to get rid of the subdirectories for
 the individual slots, and just have a file pg_replslot/$SLOTNAME, or
 not.  I know there are later patches that need subdirectories for
 their own private data, but they could just create
 pg_replslot/$SLOTNAME.dir and put whatever in it they like, without
 really implicating this code that much.  The advantage of that is that
 there would be fewer intermediate states.  The slot exists if the file
 exists, and not if it doesn't.  You still need half-alive and
 half-dead until the fsync finishes, but you don't need to worry about
 tracking both the state of the directory and the state of the file.

Why do we need directories at all?  I know there might be subsidiary
files to store stuff in separate files, but maybe we can just name files
using the slot name (or a transformation thereof) as a prefix.  It
shouldn't be difficult to remove the right files whenever there's a
need, and not having to worry about a directory that might need a
separate fsync might make things easier.

On the other hand, there might still be a need to fsync the parent
directory, so maybe there is not that much gain.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane escribió:
 That particular ID would be a horrid choice, because we don't try very
 hard to ensure it's unique.  In particular, a standby server on the same
 machine as the master (not an uncommon case, at least for testing
 purposes) would be a guaranteed fail with that approach.

 I wonder if it would make sense to generate a unique name at some
 initial point in the history of the service (perhaps at initdb time, or
 at the first postmaster start) and store this name in a special,
 separate file in PGDATA.  On subsequent starts we read the name from
 there and always use it consistently.

Meh --- that would have the same behavior as the system identifier,
ie it would propagate to slave servers without extraordinary (and
easily bypassed) measures to prevent it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

2014-01-23 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 01/23/2014 10:37 PM, Tom Lane wrote:
 The problem appears to be due to the misaligned plist pointer
 (0xc39efac1 here).

 Ah, thanks! Looks like I removed a SHORTALIGN from ginFormTuple that was 
 in fact very much necessary.. Fixed now, let's see if that pacifies the 
 sparcs.

My HPPA box is happy again, anyway.  Thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Magnus Hagander
On Thu, Jan 23, 2014 at 10:00 PM, Harold Giménez har...@heroku.com wrote:

 On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote:
  On 01/23/2014 12:34 PM, Joshua D. Drake wrote:
 
  Hello,
 
  I have run into yet again another situation where there was an
  assumption that autovacuum was keeping up and it wasn't. It was caused
  by autovacuum quitting because another process requested a lock.
 
  In turn we received a ton of bloat on pg_attribute which caused all
  kinds of other issues (as can be expected).
 
  The more I run into it, the more it seems like autovacuum should behave
  like vacuum, in that it gets precedence when it is running. First come,
  first serve as they say.
 
  Thoughts?
 
  If we let autovacuum block user activity, a lot more people would turn
  it off.
 
  Now, if you were to argue that we should have some way to monitor the
  tables which autovac can never touch because of conflicts, I would agree
  with you.

 Agree completely. Easy ways to monitor this would be great. Once you
 know there's a problem, tweaking autovacuum settings is very hard and
 misunderstood, and explaining how to be effective at it is a dark art
 too.


FWIW, I have a patch around somewhere that I never cleaned up properly for
submissions that simply added a counter to pg_stat_user_tables indicating
how many times vacuum had aborted on that specific table. If that's enough
info  (it was for my case) to cover this case, I can try to dig it out
again and clean it up...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
I wrote:
 the idea that we might get many dozen such warnings on more-current
 compilers is scarier, as that might well interfere with people's
 ability to do development on, say, Windows.  Could somebody check
 whether MSVC for instance complains about format strings using z?
 Or shall I just commit this and we'll see what the buildfarm says?

Given the lack of response, I've pushed the patch, and will canvass
the buildfarm results later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

I'm still not clear on why we can't just use the port number.


It will be possible to use port to set the default value of event_source GUC 
when starting postmaster.  But using port during event source registration 
will involve much more.
To use port, we have to tell the location of $PGDATA to regsvr32.exe. 
However, regsvr32.exe can only take an argument from /i, and we are using /i 
for event source name specification.  If we want to pass data directory, we 
have to change the usage.  Instead, we could probably have regsvr32.exe 
check PGDATA env variable and invoke postgres -C event_source, but that 
would require much more complicated code (e.g. for locating postgres.exe, 
because regsvr32.exe is in Windows directory)


Anyway, the point of my patch is to just make pg_ctl use event_source GUC 
for outputing to event log.  I want to rely on postgres -C, because pg_ctl 
already uses it for retrieving data_directory GUC.  I'd like to avoid 
further complication in code and discussion.  If you request, I can revert 
the default value of event_source and regsvr32.exe /i to PostgreSQL.  I'm 
okay with that, because syslog_ident also has the default value postgres, 
which doesn't contain the major release.


Any (not complicated) suggestions?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-23 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 From: Tom Lane t...@sss.pgh.pa.us
 I'm still not clear on why we can't just use the port number.

 To use port, we have to tell the location of $PGDATA to regsvr32.exe. 

[ scratches head... ]  Exactly which of the other proposals *doesn't*
require that?  Certainly anything that involves parsing settings out
of postgresql.conf will.

A more significant problem, probably, is that even knowing $PGDATA doesn't
tell you the port number with certainty, since the postmaster might end
up taking the port number from some other source than postgresql.conf
(command line, PGPORT environment, ...).  We used to require that pg_ctl
know the port accurately, and it was a big step forward in reliability
when we got rid of that; so maybe going back to that is not such a good
idea.

I note though that pg_ctl does still need to know accurately where $PGDATA
is.  Is there any particular length limit on event source names?  Maybe
the full path to $PGDATA could be used instead of the port number.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Josh Berkus
On 01/23/2014 02:17 PM, Magnus Hagander wrote:
 FWIW, I have a patch around somewhere that I never cleaned up properly for
 submissions that simply added a counter to pg_stat_user_tables indicating
 how many times vacuum had aborted on that specific table. If that's enough
 info  (it was for my case) to cover this case, I can try to dig it out
 again and clean it up...

It would be 100% more information than we currently have.  How much more
difficult would it be to count completed autovacuums as well?  It's
really the ratio of the two which matters ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Josh Berkus
On 01/23/2014 02:55 PM, Josh Berkus wrote:
 On 01/23/2014 02:17 PM, Magnus Hagander wrote:
 FWIW, I have a patch around somewhere that I never cleaned up properly for
 submissions that simply added a counter to pg_stat_user_tables indicating
 how many times vacuum had aborted on that specific table. If that's enough
 info  (it was for my case) to cover this case, I can try to dig it out
 again and clean it up...
 
 It would be 100% more information than we currently have.  How much more
 difficult would it be to count completed autovacuums as well?  It's
 really the ratio of the two which matters ...

Actually, now that I think about it, the ratio of the two doesn't matter
as much as whether the most recent autovacuum aborted or not.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Jeff Janes
On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 24/01/14 10:16, Mark Kirkwood wrote:

 On 24/01/14 10:09, Robert Haas wrote:

 On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
 mark.kirkw...@catalyst.net.nz wrote:

 On 24/01/14 09:49, Tom Lane wrote:

 2. What have you got that is requesting exclusive lock on pg_attribute?
 That seems like a pretty unfriendly behavior in itself. regards, tom
 lane

 I've seen this sort of problem where every db session was busily
 creating
 temporary tables. I never got to the find *why* they needed to make so
 many,
 but it seemed like a bad idea.

 But... how does that result on a vacuum-incompatible lock request
 against pg_attribute?

 I see that it'll insert lots of rows into pg_attribute, and maybe
 later delete them, but none of that blocks vacuum.


 That was my thought too - if I see it happening again here (was a year or
 so ago that I saw some serious pg_attribute bloat) I'll dig deeper.



 Actually not much digging required. Running the attached script via
 pgbench (8 sessions) against a default configured postgres 8.4 sees
 pg_attribute get to 1G after about 15 minutes.


At that rate, with default throttling, it will be a close race whether
autovac can vacuum pages as fast as they are being added.  Even if it never
gets cancelled, it might not ever finish.

Cheers,

Jeff


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Mark Kirkwood

On 24/01/14 12:13, Jeff Janes wrote:

On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:


On 24/01/14 10:16, Mark Kirkwood wrote:


On 24/01/14 10:09, Robert Haas wrote:


On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:


On 24/01/14 09:49, Tom Lane wrote:


2. What have you got that is requesting exclusive lock on pg_attribute?
That seems like a pretty unfriendly behavior in itself. regards, tom
lane


I've seen this sort of problem where every db session was busily
creating
temporary tables. I never got to the find *why* they needed to make so
many,
but it seemed like a bad idea.


But... how does that result on a vacuum-incompatible lock request
against pg_attribute?

I see that it'll insert lots of rows into pg_attribute, and maybe
later delete them, but none of that blocks vacuum.



That was my thought too - if I see it happening again here (was a year or
so ago that I saw some serious pg_attribute bloat) I'll dig deeper.




Actually not much digging required. Running the attached script via
pgbench (8 sessions) against a default configured postgres 8.4 sees
pg_attribute get to 1G after about 15 minutes.


At that rate, with default throttling, it will be a close race whether
autovac can vacuum pages as fast as they are being added.  Even if it never
gets cancelled, it might not ever finish.



Yes - I should have set the cost delay to 0 first (checking that now).



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.1

2014-01-23 Thread Andres Freund
Hi,

On 2014-01-23 16:04:10 -0500, Robert Haas wrote:
 Patch 0001:
 
 +errmsg(could not find free replication 
 slot),
 
 Suggest: all replication slots are in use

That sounds better indeed.

 +   elog(ERROR, cannot aquire a slot while another slot
 has been acquired);
 
 Suggest: this backend has already acquired a replication slot
 
 Or demote it to Assert().  I'm not really sure why this needs to be
 checked in non-assert builds.

Hm. Fine with me, not sure why I went with an elog(). Maybe because I
thought output plugin authors could have the idea of using another slot
while inside one?

  I also wonder if we should use the
 terminology attach instead of acquire; that pairs more naturally
 with release.  Then the message, if we want more than an assert,
 might be this backend is already attached to a replication slot.

I went with Acquire/Release because our locking code does so, and it
seemed sensible to be consistent. I don't have strong feelings about it.

 +   if (slot == NULL)
 +   {
 +   LWLockRelease(ReplicationSlotCtlLock);
 +   ereport(ERROR,
 +   (errcode(ERRCODE_UNDEFINED_OBJECT),
 +errmsg(could not find replication
 slot \%s\, name)));
 +   }
 
 The error will release the LWLock anyway; I'd get rid of the manual
 LWLockRelease, and the braces.  Similarly in ReplicationSlotDrop.

Unfortunately not. Inside the walsender there's currently no
LWLockReleaseAll() for ERRORs since commands aren't run inside a
transaction command...

But maybe I should have fixed this by adding the release to
WalSndErrorCleanup() instead? That'd still leave the problematic case
that currently we try to delete a replication slot inside a CATCH when
we fail while initializing the rest of logical replication... But I
guess adding it would be a good idea independent of that.

We could also do a StartTransactionCommand() but I'd rather not, that
currently prevents code in that vicinity from doing anything it
shouldn't via various Assert()s in existing code.

 +   /* acquire spinlock so we can test and set -active safely */
 +   SpinLockAcquire(slot-mutex);
 +
 +   if (slot-active)
 +   {
 +   SpinLockRelease(slot-mutex);
 +   LWLockRelease(ReplicationSlotCtlLock);
 +   ereport(ERROR,
 +   (errcode(ERRCODE_OBJECT_IN_USE),
 +errmsg(slot \%s\ already active, name)));
 +   }
 +
 +   /* we definitely have the slot, no errors possible anymore */
 +   slot-active = true;
 +   MyReplicationSlot = slot;
 +   SpinLockRelease(slot-mutex);
 
 This doesn't need the LWLockRelease either.  It does need the
 SpinLockRelease, but as I think I noted previously, the right way to
 write this is probably: SpinLockAcquire(slot-mutex); was_active =
 slot-active; slot-active = true; SpinLockRelease(slot-mutex); if
 (was_active) ereport(). MyReplicatonSlot = slot.

That's not really simpler tho? But if you prefer I can go that way.

 ReplicationSlotsComputeRequiredXmin still acquires ProcArrayLock, and
 the comment Provide interlock against concurrent recomputations
 doesn't seem adequate to me.  I guess the idea here is that we regard
 ProcArrayLock as protecting ReplicationSlotCtl-catalog_xmin and
 ReplicationSlotCtl-data_xmin, but if that's the idea then we only
 need to hold the lock during the time when we actually update those
 values, not the loop where we compute them.

There's a comment someplace else to that end, but yes, that's
essentially the idea. I decided to take it during the whole
recomputation because we also take ProcArrayLock when creating a new
decoding slot and initially setting -catalog_xmin. That's not strictly required
but seemed simpler that way, and the code shouldn't be very hot.
The code that initially computes the starting value for catalog_xmin
when creating a new decoding slot has to take ProcArrayLock to be safe,
that's why I though it'd be convenient to always use it for those
values.

In all other cases where we modify *_xmin we're only increasing it which
doesn't need a lock (HS feedback never has taken one, and
GetSnapshotData() modifies -xmin while holding a shared lock), the only
potential danger is a slight delay in increasing the overall value.

 Also, if that's the
 design, maybe they should be part of PROC_HDR *ProcGlobal rather than
 here.  It seems weird to have some of the values protected by
 ProcArrayLock live in a completely different data structure managed
 almost entirely by some other part of the system.

Don't we already have cases of that? I seem to remember so. If you
prefer having them there, I am certainly fine with doing that. This way
they aren't allocated if slots are disabled but it's just two
TransactionIds.

 It's pretty evident that what's currently patch #4 (only peg the xmin
 horizon for catalog tables during 

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2014-01-23 Thread Andres Freund
On 2014-01-23 13:56:49 +0100, Simon Riggs wrote:
 IMHO we need to resolve the deadlock inherent in the
 disk-full/WALlock-up/checkpoint situation. My view is that can be
 solved in a similar way to the way the buffer pin deadlock was
 resolved for Hot Standby.

I don't think that approach works here. We're not talking about mere
buffer pins but the big bad exclusively locked buffer which is held by a
backend in a critical section. Killing such a backend costs you a PANIC.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Andres Freund
On 2014-01-23 17:21:11 -0500, Tom Lane wrote:
 I wrote:
  the idea that we might get many dozen such warnings on more-current
  compilers is scarier, as that might well interfere with people's
  ability to do development on, say, Windows.  Could somebody check
  whether MSVC for instance complains about format strings using z?
  Or shall I just commit this and we'll see what the buildfarm says?
 
 Given the lack of response, I've pushed the patch, and will canvass
 the buildfarm results later.

Thanks, I was afk, otherwise I'd have tried to pushing it to windows via
jenkins if that machine was running (it's running in Craig's flat...)…

Do we have a real policy against indenting nested preprocessor
statments? Just so I don't do those in future patches...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Mark Kirkwood

On 24/01/14 12:28, Mark Kirkwood wrote:

On 24/01/14 12:13, Jeff Janes wrote:

On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:


On 24/01/14 10:16, Mark Kirkwood wrote:


On 24/01/14 10:09, Robert Haas wrote:


On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
mark.kirkw...@catalyst.net.nz wrote:


On 24/01/14 09:49, Tom Lane wrote:

2. What have you got that is requesting exclusive lock on 
pg_attribute?
That seems like a pretty unfriendly behavior in itself. regards, 
tom

lane


I've seen this sort of problem where every db session was busily
creating
temporary tables. I never got to the find *why* they needed to 
make so

many,
but it seemed like a bad idea.


But... how does that result on a vacuum-incompatible lock request
against pg_attribute?

I see that it'll insert lots of rows into pg_attribute, and maybe
later delete them, but none of that blocks vacuum.


That was my thought too - if I see it happening again here (was a 
year or

so ago that I saw some serious pg_attribute bloat) I'll dig deeper.




Actually not much digging required. Running the attached script via
pgbench (8 sessions) against a default configured postgres 8.4 sees
pg_attribute get to 1G after about 15 minutes.


At that rate, with default throttling, it will be a close race whether
autovac can vacuum pages as fast as they are being added.  Even if it 
never

gets cancelled, it might not ever finish.



Yes - I should have set the cost delay to 0 first (checking that now).





Doing that (and a few other autovac tweaks):

autovacuum_max_workers = 4
autovacuum_naptime = 10s
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 0ms

Stops excessive bloat - clearly autovacuum *is* able to vacuum 
pg_attribute in this case. Back to drawing board for a test case.


Regards

Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Andres Freund
On 2014-01-24 12:49:57 +1300, Mark Kirkwood wrote:
 autovacuum_max_workers = 4
 autovacuum_naptime = 10s
 autovacuum_vacuum_scale_factor = 0.1
 autovacuum_analyze_scale_factor = 0.1
 autovacuum_vacuum_cost_delay = 0ms
 
 Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute
 in this case. Back to drawing board for a test case.

Well, I think quite many people don't realize it might be necessary to
tune autovac on busy workloads. As it very well might be the case in
Josh's case.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Andres Freund
On 2014-01-23 16:15:50 -0500, Tom Lane wrote:
 [ thinks... ]  It's possible that what you saw is not the
 kick-out-autovacuum-entirely behavior, but the behavior added in commit
 bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it
 can't immediately get an exclusive buffer lock on.  On a heavily used
 table, we might skip the same page repeatedly, so that dead tuples don't
 get cleaned for a long time.

I don't think it's too likely as an explanation here. Such workloads are
likely to fill a page with only dead tuples, right? Once all tuples are
safely dead they will be killed from the btree which should cause the
page not to be visited anymore and thus safely vacuumable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Do we have a real policy against indenting nested preprocessor
 statments? Just so I don't do those in future patches...

Indent 'em if you like, but pgindent will undo it.  I ran the patch
through pgindent to see what it would do with those, and it left-justified
them, so I committed it that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-23 16:15:50 -0500, Tom Lane wrote:
 [ thinks... ]  It's possible that what you saw is not the
 kick-out-autovacuum-entirely behavior, but the behavior added in commit
 bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it
 can't immediately get an exclusive buffer lock on.  On a heavily used
 table, we might skip the same page repeatedly, so that dead tuples don't
 get cleaned for a long time.

 I don't think it's too likely as an explanation here. Such workloads are
 likely to fill a page with only dead tuples, right? Once all tuples are
 safely dead they will be killed from the btree which should cause the
 page not to be visited anymore and thus safely vacuumable.

I added some instrumentation to vacuumlazy.c to count the number of pages
skipped in this way.  You're right, it seems to be negligible, at least
with Mark's test case.  I saw at most two pages skipped per vacuum, and
usually none; so there's no way that a whole lot of tuples are going
unvacuumed because of this.  (I wonder though if we ought to add such
counting as a permanent feature ...)

I concur with the other reports that the main problem in this test case is
just that the default cost delay settings throttle autovacuum so hard that
it has no chance of keeping up.  If I reduce autovacuum_vacuum_cost_delay
from the default 20ms to 2ms, it seems to keep up quite nicely, on my
machine anyway.  Probably other combinations of changes would do it too.

Perhaps we need to back off the default cost delay settings a bit?
We've certainly heard more than enough reports of table bloat in
heavily-updated tables.  A system that wasn't hitting the updates as hard
as it could might not need this, but on the other hand it probably
wouldn't miss the I/O cycles from a more aggressive autovacuum, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Andres Freund
On 2014-01-23 19:29:23 -0500, Tom Lane wrote:
 I saw at most two pages skipped per vacuum, and
 usually none; so there's no way that a whole lot of tuples are going
 unvacuumed because of this.  (I wonder though if we ought to add such
 counting as a permanent feature ...)

I generally think we need to work a bit on the data reported back by
vacuum. Adding data and also making the data output when using
autovacuum more consistent with what VACUUM VERBOSE reports. The latter
curiously often has less detail than autovacuum.
I had hoped to get to that for 9.4, but it doesn't look like it.


 I concur with the other reports that the main problem in this test case is
 just that the default cost delay settings throttle autovacuum so hard that
 it has no chance of keeping up.  If I reduce autovacuum_vacuum_cost_delay
 from the default 20ms to 2ms, it seems to keep up quite nicely, on my
 machine anyway.  Probably other combinations of changes would do it too.

 Perhaps we need to back off the default cost delay settings a bit?
 We've certainly heard more than enough reports of table bloat in
 heavily-updated tables.  A system that wasn't hitting the updates as hard
 as it could might not need this, but on the other hand it probably
 wouldn't miss the I/O cycles from a more aggressive autovacuum, either.

Yes, I think adjusting the default makes sense, most setups that have
enough activity that costing plays a role have to greatly increase the
values. I'd rather increase the cost limit than reduce cost delay so
drastically though, but that's admittedly just gut feeling.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-23 19:29:23 -0500, Tom Lane wrote:
 I concur with the other reports that the main problem in this test case is
 just that the default cost delay settings throttle autovacuum so hard that
 it has no chance of keeping up.  If I reduce autovacuum_vacuum_cost_delay
 from the default 20ms to 2ms, it seems to keep up quite nicely, on my
 machine anyway.  Probably other combinations of changes would do it too.

 Perhaps we need to back off the default cost delay settings a bit?
 We've certainly heard more than enough reports of table bloat in
 heavily-updated tables.  A system that wasn't hitting the updates as hard
 as it could might not need this, but on the other hand it probably
 wouldn't miss the I/O cycles from a more aggressive autovacuum, either.

 Yes, I think adjusting the default makes sense, most setups that have
 enough activity that costing plays a role have to greatly increase the
 values. I'd rather increase the cost limit than reduce cost delay so
 drastically though, but that's admittedly just gut feeling.

Well, I didn't experiment with intermediate values, I was just trying
to test the theory that autovac could keep up given less-extreme
throttling.  I'm not taking any position on just where we need to set
the values, only that what we've got is probably too extreme.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Craig Ringer
On 01/24/2014 07:52 AM, Andres Freund wrote:
 On 2014-01-24 12:49:57 +1300, Mark Kirkwood wrote:
 autovacuum_max_workers = 4
 autovacuum_naptime = 10s
 autovacuum_vacuum_scale_factor = 0.1
 autovacuum_analyze_scale_factor = 0.1
 autovacuum_vacuum_cost_delay = 0ms

 Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute
 in this case. Back to drawing board for a test case.
 
 Well, I think quite many people don't realize it might be necessary to
 tune autovac on busy workloads. As it very well might be the case in
 Josh's case.

Oh, lots of people realise it's a good idea to tune autovac on busy
workloads.

They just do it in the wrong direction, making it run less often and
less aggressively, causing more bloat, and making their problem worse.

I've seen this enough times that I'm starting to think the autovauum
tuning knobs need a child safety lock ;-)

More seriously, people don't understand autovacuum, how it works, or why
they need it. They notice it when things are already bad, see that it's
doing lots of work and doing lots of I/O that competes with queries, and
turn it off to solve the problem.

I'm not sure how to tackle that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Claudio Freire
On Thu, Jan 23, 2014 at 10:38 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute
 in this case. Back to drawing board for a test case.

 Well, I think quite many people don't realize it might be necessary to
 tune autovac on busy workloads. As it very well might be the case in
 Josh's case.

 Oh, lots of people realise it's a good idea to tune autovac on busy
 workloads.

 They just do it in the wrong direction, making it run less often and
 less aggressively, causing more bloat, and making their problem worse.

 I've seen this enough times that I'm starting to think the autovauum
 tuning knobs need a child safety lock ;-)

 More seriously, people don't understand autovacuum, how it works, or why
 they need it. They notice it when things are already bad, see that it's
 doing lots of work and doing lots of I/O that competes with queries, and
 turn it off to solve the problem.


AFAIK, tuning down autovacuum is common advice **when compounded with
manually scheduled vacuuming**.

The problem of autovacuum is that it always picks the wrong time to
work. That is, when the DB is the most active. Because statistically
that's when the thresholds are passed.

If you ask me, I'd like autovac to know when not to run (or rather
wait a bit, not forever), perhaps by checking load factors or some
other tell-tale of an already-saturated I/O system.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2014-01-23 Thread Tomas Vondra
On 23.1.2014 17:22, Heikki Linnakangas wrote:
 On 01/14/2014 05:35 PM, Alexander Korotkov wrote:
 Attached version is rebased against last version of packed posting lists.
 
 Thanks!
 
 I think we're missing a trick with multi-key queries. We know that when
 multiple scan keys are used, they are ANDed together, so we can do the
 skip optimization even without the new tri-state consistent function.
 
 To get started, I propose the three attached patches. These only
 implement the optimization for the multi-key case, which doesn't require
 any changes to the consistent functions and hence no catalog changes.
 Admittedly this isn't anywhere near as useful in practice as the single
 key case, but let's go for the low-hanging fruit first. This
 nevertheless introduces some machinery that will be needed by the full
 patch anyway.
 
 I structured the code somewhat differently than your patch. There is no
 separate fast-path for the case where the optimization applies. Instead,
 I'm passing the advancePast variable all the way down to where the next
 batch of items are loaded from the posting tree. keyGetItem is now
 responsible for advancing the entry streams, and the logic in
 scanGetItem has been refactored so that it advances advancePast
 aggressively, as soon as one of the key streams let us conclude that no
 items  a certain point can match.
 
 scanGetItem might yet need to be refactored when we get to the full
 preconsistent check stuff, but one step at a time.
 
 
 The first patch is the most interesting one, and contains the
 scanGetItem changes. The second patch allows seeking to the right
 segment in a posting tree page, and the third allows starting the
 posting tree scan from root, when skipping items (instead of just
 following the right-links).
 
 Here are some simple performance test results, demonstrating the effect
 of each of these patches. This is a best-case scenario. I don't think
 these patches has any adverse effects even in the worst-case scenario,
 although I haven't actually tried hard to measure that. The used this to
 create a test table:
 
 create table foo (intarr int[]);
 -- Every row contains 0 (frequent term), and a unique number.
 insert into foo select array[0,g] from generate_series(1, 1000) g;
 -- Add another tuple with 0, 1 combo physically to the end of the table.
 insert into foo values (array[0,1]);
 
 The query I used is this:
 
 postgres=# select count(*) from foo where intarr @ array[0] and intarr
 @ array[1];
  count
 ---
  2
 (1 row)
 
 I measured the time that query takes, and the number of pages hit, using
 explain (analyze, buffers true) 
 
 patchestime (ms)buffers
 ---
 unpatched6501316
 patch 10.521316
 patches 1+20.501316
 patches 1+2+30.1315
 
 So, the second patch isn't doing much in this particular case. But it's
 trivial, and I think it will make a difference in other queries where
 you have the opportunity skip, but return a lot of tuples overall.
 
 In summary, these are fairly small patches, and useful on their, so I
 think these should be committed now. But please take a look and see if
 the logic in scanGetItem/keyGetItem looks correct to you. After this, I
 think the main fast scan logic will go into keyGetItem.
 
 PS. I find it a bit surprising that in your patch, you're completely
 bailing out if there are any partial-match keys involved. Is there some
 fundamental reason for that, or just not implemented?

I've done some initial testing (with all the three patches applied)
today to see if there are any crashes or obvious failures and found none
so far. The only issue I've noticed is this LOG message in ginget.c:

elog(LOG, entryLoadMoreItems, %u/%u, skip: %d,
 ItemPointerGetBlockNumber(advancePast),
 ItemPointerGetOffsetNumber(advancePast),
 !stepright);

which produces enormous amount of messages. I suppose that was used for
debugging purposes and shouldn't be there?

I plan to do more thorough testing over the weekend, but I'd like to
make sure I understand what to expect. My understanding is that this
patch should:

- give the same results as the current code (e.g. the fulltext should
  not return different rows / change the ts_rank etc.)

- improve the performance of fulltext queries

Are there any obvious rules what queries will benefit most from this?
The queries generated by the tool I'm using for testing are mostly of
this form:

  SELECT id FROM messages
   WHERE body_tsvector @ plainto_tsquery('english', 'word1 word2 ...')
   ORDER BY ts_rank(...) DESC LIMIT :n;

with varying number of words and LIMIT values. During the testing today
I haven't noticed any obvious performance difference, but I haven't
spent much time on that.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change authentication error message (patch)

2014-01-23 Thread Bruce Momjian
On Wed, Jun 19, 2013 at 01:27:39PM -0700, Joshua D. Drake wrote:
 
 On 06/19/2013 01:18 PM, Markus Wanner wrote:
 
 Authentication failed or password has expired for user \%s\
 
 Authentication failed covers any combination of a username/password
 being wrong and obviously password expired covers the other.
 
 Works for me. Considering the password to be the thing that expires
 (rather than the account) is probably more accurate as well.
 
 It is also how it is worded in the docs (which is why I used it).
 Patch below.

I have developed the attached patch to fix this problem.  Do I need to
say invalid user or invalid or expired password?
 ---

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
new file mode 100644
index 882dc8f..fa96238
*** a/src/backend/libpq/auth.c
--- b/src/backend/libpq/auth.c
*** auth_failed(Port *port, int status)
*** 245,251 
  			break;
  		case uaPassword:
  		case uaMD5:
! 			errstr = gettext_noop(password authentication failed for user \%s\);
  			/* We use it to indicate if a .pgpass password failed. */
  			errcode_return = ERRCODE_INVALID_PASSWORD;
  			break;
--- 245,251 
  			break;
  		case uaPassword:
  		case uaMD5:
! 			errstr = gettext_noop(password authentication failed for user \%s\: invalid or expired password);
  			/* We use it to indicate if a .pgpass password failed. */
  			errcode_return = ERRCODE_INVALID_PASSWORD;
  			break;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2014-01-23 Thread Craig Ringer
On 01/24/2014 10:12 AM, Craig Ringer wrote:
 (Re-sending; I forgot to cc the list)
 
 On 01/20/2014 02:15 PM, Craig Ringer wrote:
 On 01/20/2014 09:58 AM, Craig Ringer wrote:
 As it is I'm spending today reworking the RLS patch on top of the new
 approach to updatable security barrier views.

 To get that rolling I've split the RLS patch up into chunks, so we can
 argue about the catalogs, ALTER syntax, and the actual row-filtering
 implementation separately ;-)

 It's currently on g...@github.com:ringerc/postgres.git in the branch
 rls-9.4-split, which is subject to rebasing. I'm still going through it
 making sure each chunk at least compiles and preferably passes make
 check.
 
 That branch is now pretty stable, and passes checks at every stage up to
 the new RLS regression tests. I've pushed a new version to branch
 rls-9.4-split. Further updates will rebase this branch.
 
 The tag rls-9.4-split-v5 identifies this particular push, and won't get
 rebased away.

Pushed a new rebase to the main working branch, merging in the fixes I
made to KaiGai's patch last time.

Tagged rls-9.4-split-v6

I haven't bothered with a patchset for this one, I'll be replacing it
again soon. This is just for anybody following along.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Alvaro Herrera
Claudio Freire escribió:

 If you ask me, I'd like autovac to know when not to run (or rather
 wait a bit, not forever), perhaps by checking load factors or some
 other tell-tale of an already-saturated I/O system.

We had a proposed design to tell autovac when not to run (or rather,
when to switch settings very high so that in practice it'd never run).
At some point somebody said but we can just change autovacuum=off in
postgresql.conf via crontab when the high load period starts, and turn
it back on afterwards --- and that was the end of it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Claudio Freire escribió:
 If you ask me, I'd like autovac to know when not to run (or rather
 wait a bit, not forever), perhaps by checking load factors or some
 other tell-tale of an already-saturated I/O system.

 We had a proposed design to tell autovac when not to run (or rather,
 when to switch settings very high so that in practice it'd never run).
 At some point somebody said but we can just change autovacuum=off in
 postgresql.conf via crontab when the high load period starts, and turn
 it back on afterwards --- and that was the end of it.

The hard part of this is that shutting down autovacuum during heavy
load may be exactly the wrong thing to do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why do we let autovacuum give up?

2014-01-23 Thread Craig Ringer
On 01/24/2014 11:32 AM, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Claudio Freire escribió:
 If you ask me, I'd like autovac to know when not to run (or rather
 wait a bit, not forever), perhaps by checking load factors or some
 other tell-tale of an already-saturated I/O system.
 
 We had a proposed design to tell autovac when not to run (or rather,
 when to switch settings very high so that in practice it'd never run).
 At some point somebody said but we can just change autovacuum=off in
 postgresql.conf via crontab when the high load period starts, and turn
 it back on afterwards --- and that was the end of it.
 
 The hard part of this is that shutting down autovacuum during heavy
 load may be exactly the wrong thing to do.

Yep. In fact, it may be appropriate to limit or stop autovacuum's work
on some big tables, while pushing its activity even higher for small,
high churn tables.

If you stop autovacuum on a message-queue system when load gets high,
you'll get a giant messy bloat explosion.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgresql for cygwin - 3rd

2014-01-23 Thread Bruce Momjian
On Mon, Jun 10, 2013 at 11:08:36AM +0200, marco atzeri wrote:
 Il 3/6/2013 11:46 PM, Andrew Dunstan ha scritto:
 
 Excellent. Will test it out soon.
 
 cheers
 
 andrew
 
 
 Andrew,
 please find attached a full patch for cygwin relative to 9.3beta1 :
 
 - DLLTOLL/DLLWRAP are not used anymore, replaced
   by gcc also for postgres.exe (*)
 - DLL versioning is added
 
 Check failures:
 - prepared_xacts is still freezing
   The cygwin failure you highlighted was solved,
   so it should be something else
 - attached the 2 regressions diffs
  tsearch  ... FAILED
  without_oid  ... FAILED
 The second one seems a new one, not sure cygwin specific

Andrew, should this configuration/code patch be applied to 9.4?

http://www.postgresql.org/message-id/51b59794.3000...@gmail.com

I think we would have to make Cygwin-specific regression output to
handle the regression failures, but frankly I am not even sure if they
are right.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-01-23 Thread Yugo Nagata
On Thu, 23 Jan 2014 13:19:37 +0200
Marti Raudsepp ma...@juffo.org wrote:

 Resending to Tatsuo Ishii and Yugo Nagata, your email server was
 having problems yesterday:

Thanks for resending!

 
 This is the mail system at host sraigw2.sra.co.jp.
 
 yug...@sranhm.sra.co.jp: mail for srasce.sra.co.jp loops back to myself
 t-is...@sra.co.jp: mail for srasce.sra.co.jp loops back to myself
 
 -- Forwarded message --
 From: Marti Raudsepp ma...@juffo.org
 Date: Thu, Jan 23, 2014 at 3:39 AM
 Subject: Re: [HACKERS] Proposal: variant of regclass
 To: Yugo Nagata nag...@sraoss.co.jp
 Cc: Tatsuo Ishii is...@postgresql.org, pgsql-hackers
 pgsql-hackers@postgresql.org, Vik Fearing vik.fear...@dalibo.com,
 Robert Haas robertmh...@gmail.com, Tom Lane t...@sss.pgh.pa.us,
 Pavel Golub pa...@gf.microolap.com, Pavel Golub
 pa...@microolap.com, Andres Freund and...@2ndquadrant.com, Pavel
 Stěhule pavel.steh...@gmail.com
 
 
 On Wed, Jan 22, 2014 at 1:44 PM, Yugo Nagata nag...@sraoss.co.jp wrote:
  On Wed, 22 Jan 2014 20:04:12 +0900 (JST)
  Tatsuo Ishii is...@postgresql.org wrote:
  parseTypeString() is called by some other functions and I avoided
  influences of modifying the definition on them, since this should
  raise errors in most cases. This is same reason for other *MissingOk
  functions in parse_type.c.
 
  Is it better to write definitions of these function and all there callers?
 
 Yes, for parseTypeString certainly. There have been many refactorings
 like that in the past and all of them use this pattern.

Ok. I'll rewrite the definition and there callers.

 
 typenameTypeIdAndMod is less clear since the code paths differ so
 much, maybe keep 2 versions (merging back to 1 function is OK too, but
 in any case you don't need 3).

I'll also fix this in either way to not use typenameTypeIdAndMod_guts.

 
 typenameTypeIdAndModMissingOk(...)
 {
 Type tup = LookupTypeName(pstate, typeName, typmod_p);
 if (tup == NULL || !((Form_pg_type) GETSTRUCT(tup))-typisdefined)
 *typeid_p = InvalidOid;
 else
 *typeid_p = HeapTupleGetOid(tup);
 
 if (tup)
 ReleaseSysCache(tup);
 }
 typenameTypeIdAndMod(...)
 {
 Type tup = typenameType(pstate, typeName, typmod_p);
 *typeid_p = HeapTupleGetOid(tup);
 ReleaseSysCache(tup);
 }
 
 
 
 Also, there's no need for else here:
 if (raiseError)
 ereport(ERROR, ...);
 else
 return InvalidOid;
 
 Regards,
 Marti


-- 
Yugo Nagata nag...@sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >