Re: [HACKERS] Possible bug in cascaded standby

2013-06-08 Thread Pavan Deolasee
 I'll retry and report back if I see the problem on the offending platform.


Just to close out this thread, I can't reproduce this on the Mac OS either.
While I'd done a make clean earlier, make distclean did the trick.
Sorry for the noise.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Optimising Foreign Key checks

2013-06-08 Thread Noah Misch
On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote:
  On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
  For clarity the 4 problems are
  1. SQL execution overhead
  2. Memory usage
  3. Memory scrolling
  4. Locking overhead, specifically FPWs and WAL records from FK checks
  probably in that order or thereabouts.

 Lets rethink things to put a few options on the table and see what we get...

 2. Don't store FK events in the after trigger queue at all, but apply
 them as we go. That solves problems2 and 3. That could be considered
 to be in violation of the SQL standard, which requires us to apply the
 checks at the end of the statement. We already violate the standard
 with regard to uniqueness checks, so doing it here doesn't seem
 unacceptable.

I wouldn't like to see that compliance bug propagate to other constraint
types.  What clauses in the standard demand end-of-statement timing, anyway?

What if we followed the example of deferred UNIQUE: attempt FK checks as we go
and enqueue an after-trigger recheck when such an initial test fails?

 Implementation: Given we know that COPY uses a ring buffer, and given
 your earlier thoughts on use of a batched SQL, I have a new
 suggestion. Every time the ring buffer fills, we go through the last
 buffers accessed, pulling out all the PKs and then issue them as a
 single SQL statement (as above). We can do that manually, or using the
 block scan mentioned previously. This uses batched SQL to solve
 problem1. It doesn't build up a large data structure in memory,
 problem2, and it also solves problem3 by accessing data blocks before
 they fall out of the ring buffer. If there are no duplicates in the
 referenced table, then this behavious will do as much as possible to
 make accesses to the referenced table also use a small working set.
 (We may even wish to consider making the batched SQL use a separate
 ring buffer for RI accesses). That approach doesn't make any
 assumptions about duplicates.

If this can be made standard-compliant, it sounds most fruitful.

 Perhaps another way would be to avoid very large COPY statements
 altogether, breaking down loads into smaller pieces.

True.  It would be nice for the system to not need such hand-holding, but
that's a largely-adequate tool for coping in the field.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau

From: Daniel Farina dan...@heroku.com

On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote:

Right now, what we're telling users is You can have continuous backup
with Postgres, but you'd better hire and expensive consultant to set it
up for you, or use this external tool of dubious provenance which
there's no packages for, or you might accidentally cause your database
to shut down in the middle of the night.

At which point most sensible users say no thanks, I'll use something 
else.



Inverted and just as well supported: if you want to not accidentally
lose data, you better hire an expensive consultant to check your
systems for all sorts of default 'safety = off' features.  This
being but the hypothetical first one.

Furthermore, I see no reason why high quality external archiving
software cannot exist.  Maybe some even exists already, and no doubt
they can be improved and the contract with Postgres enriched to that
purpose.

Finally, it's not that hard to teach any archiver how to no-op at
user-peril, or perhaps Postgres can learn a way to do this expressly
to standardize the procedure a bit to ease publicly shared recipes, 
perhaps.


Yes, I feel designing reliable archiving, even for the simplest case - copy 
WAL to disk, is very difficult.  I know there are following three problems 
if you just follow the PostgreSQL manual.  Average users won't notice them. 
I guess even professional DBAs migrating from other DBMSs won't, either.


1. If the machine or postgres crashes while archive_command is copying a WAL 
file, later archive recovery fails.
This is because cp leaves a file of less than 16MB in archive area, and 
postgres refuses to start when it finds such a small archive WAL file.
The solution, which IIRC Tomas san told me here, is to do like cp %p 
/archive/dir/%f.tmp  mv /archive/dir/%f.tmp /archive/dir/%f.


2. archive_command dumps core when you run pg_ctl stop -mi.
This is because postmaster sends SIGQUIT to all its descendants.  The core 
files accumulate in the data directory, which will be backed up with the 
database.  Of course those core files are garbage.

archive_command script needs to catch SIGQUIT and exit.

3. You cannot know the reason of archive_command failure (e.g. archive area 
full) if you don't use PostgreSQL's server logging.

This is because archive_command failure is not logged in syslog/eventlog.


I hope PostgreSQL will provide a reliable archiving facility that is ready 
to use.


Regards
MauMau







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF-8 encoding problem w/ libpq

2013-06-08 Thread Andrew Dunstan


On 06/03/2013 02:41 PM, Andrew Dunstan wrote:


On 06/03/2013 02:28 PM, Tom Lane wrote:
. I wonder though if we couldn't just fix this code to not do 
anything to high-bit-set bytes in multibyte encodings.



That's exactly what I suggested back in November.



This thread seems to have gone cold, so I have applied the fix I 
originally suggested along these lines to all live branches.


At least that means we won't produce junk, but we still need to work out 
how to downcase multi-byte characters.


If anyone thinks there are other places in the code that need similar 
treatment, they are welcome to find them. I have not yet found one.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system catalog pg_rewrite column ev_attr document description problem

2013-06-08 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:

 I'll leave this alone for a day.  If nobody objects, I will change
 the ruleutils.c code to work with either value (to support
 pg_upgrade) and change the code to set this to zero, for 9.3 and
 forward only.  I will change the 9.3 docs to mention that newly
 created rows will have zero, but that clusters upgraded via
 pg_upgrade may still have some rows containing the old value of -1.

 For anyone casually following along without reading the code, it's
 not a bug in the sense that current code ever misbehaves, but the
 value which has been used for ev_attr to indicate whole table
 since at least Postgres95 version 1.01 is not consistent with other
 places we set a dummy value in attribute number to indicate whole
 table.  Since 2002 we have not supported column-level rules, so it
 has just been filled with a constant of -1.  The idea is to change
 the constant to zero -- to make it more consistent so as to reduce
 confusion and the chance of future bugs should we ever decide to
 use the column again.

When I went to make this change, I found over 100 lines of obsolete
code related to this.  Commit 95ef6a344821655ce4d0a74999ac49dd6af6d342
removed the ability to create a rule on a column effective with
7.3, but a lot of code for supporting that was left behind.  It
seems to me that the rewrite area is complicated without carrying
code that's been dead for over a decade.  The first patch removes
the dead weight.  The second patch makes the change suggested by
Tom.  Those carrying forward from beta1 without an initdb will
still see some rows in pg_rewrite with -1, but anyone else will see
zero for this column.

Does anyone see a problem with applying both of these for 9.3?

 If we were a little earlier in the release cycle I would argue that
 if we're going to do anything with this column we should drop it.

Which is exactly what I think we should do as soon as we branch.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/src/backend/rewrite/rewriteHandler.c
--- b/src/backend/rewrite/rewriteHandler.c
***
*** 1274,1288  matchLocks(CmdType event,
  			}
  		}
  
! 		if (oneLock-event == event)
! 		{
! 			if (parsetree-commandType != CMD_SELECT ||
! (oneLock-attrno == -1 ?
!  rangeTableEntry_used((Node *) parsetree, varno, 0) :
!  attribute_used((Node *) parsetree,
! varno, oneLock-attrno, 0)))
! matching_locks = lappend(matching_locks, oneLock);
! 		}
  	}
  
  	return matching_locks;
--- 1274,1281 
  			}
  		}
  
! 		if (oneLock-event == event  parsetree-commandType != CMD_SELECT)
! 			matching_locks = lappend(matching_locks, oneLock);
  	}
  
  	return matching_locks;
***
*** 1296,1302  static Query *
  ApplyRetrieveRule(Query *parsetree,
    RewriteRule *rule,
    int rt_index,
-   bool relation_level,
    Relation relation,
    List *activeRIRs,
    bool forUpdatePushedDown)
--- 1289,1294 
***
*** 1310,1317  ApplyRetrieveRule(Query *parsetree,
  		elog(ERROR, expected just one rule action);
  	if (rule-qual != NULL)
  		elog(ERROR, cannot handle qualified ON SELECT rule);
- 	if (!relation_level)
- 		elog(ERROR, cannot handle per-attribute ON SELECT rule);
  
  	if (rt_index == parsetree-resultRelation)
  	{
--- 1302,1307 
***
*** 1633,1646  fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
  			if (rule-event != CMD_SELECT)
  continue;
  
- 			if (rule-attrno  0)
- 			{
- /* per-attr rule; do we need it? */
- if (!attribute_used((Node *) parsetree, rt_index,
- 	rule-attrno, 0))
- 	continue;
- 			}
- 
  			locks = lappend(locks, rule);
  		}
  
--- 1623,1628 
***
*** 1665,1671  fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
  parsetree = ApplyRetrieveRule(parsetree,
  			  rule,
  			  rt_index,
- 			  rule-attrno == -1,
  			  rel,
  			  activeRIRs,
  			  forUpdatePushedDown);
--- 1647,1652 
*** a/src/backend/rewrite/rewriteManip.c
--- b/src/backend/rewrite/rewriteManip.c
***
*** 858,927  rangeTableEntry_used(Node *node, int rt_index, int sublevels_up)
  
  
  /*
-  * attribute_used -
-  *	Check if a specific attribute number of a RTE is used
-  *	somewhere in the query or expression.
-  */
- 
- typedef struct
- {
- 	int			rt_index;
- 	int			attno;
- 	int			sublevels_up;
- } attribute_used_context;
- 
- static bool
- attribute_used_walker(Node *node,
- 	  attribute_used_context *context)
- {
- 	if (node == NULL)
- 		return false;
- 	if (IsA(node, Var))
- 	{
- 		Var		   *var = (Var *) node;
- 
- 		if (var-varlevelsup == context-sublevels_up 
- 			var-varno == context-rt_index 
- 			var-varattno == context-attno)
- 			return true;
- 		return false;
- 	}
- 	if (IsA(node, Query))
- 	{
- 		/* Recurse into subselects */
- 		bool		

[HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS

2013-06-08 Thread Tom Lane
Recently we had a gripe about how you can't read very many files
concurrently with contrib/file_fdw:
http://www.postgresql.org/message-id/of419b5767.8a3c9adb-on85257b79.005491e9-85257b79.0054f...@isn.rtss.qc.ca

The reason for this is that file_fdw goes through the COPY code, which
uses AllocateFile(), which has a wired-in assumption that not very many
files need to be open concurrently.  I thought for a bit about trying to
get COPY to use a virtual file descriptor such as is provided by the
rest of fd.c, but that didn't look too easy, and in any case probably
nobody would be excited about adding additional overhead to the COPY
code path.  What seems more practical is to relax the hard-coded limit
on the number of files concurrently open through AllocateFile().  Now,
we could certainly turn that array into some open-ended list structure,
but that still wouldn't let us have an arbitrary number of files open,
because at some point we'd run into platform-specific EMFILES or ENFILES
limits on the number of open file descriptors.  In practice we want to
keep it under the max_safe_fds limit that fd.c goes to a lot of trouble
to determine.  So it seems like the most useful compromise is to keep
the allocatedDescs[] array data structure as-is, but allow its size to
depend on max_safe_fds.  In the attached proposed patch I limit it to
max_safe_fds / 2, so that there's still a reasonable number of FDs
available for fd.c's main pool of virtual FDs.  On typical modern
platforms that should be at least a couple of hundred, thus making the
effective limit about an order of magnitude more than it is currently
(32).

Barring objections or better ideas, I'd like to back-patch this as far
as 9.1 where file_fdw was introduced.

regards, tom lane

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 78c7d41ac48d95d6648baa3951563ba9e4ad3496..7de93647759ce8dc58d0a9f3c8aad9ed7f36f5e6 100644
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
***
*** 43,50 
   * wrappers around fopen(3), opendir(3), popen(3) and open(2), respectively.
   * They behave like the corresponding native functions, except that the handle
   * is registered with the current subtransaction, and will be automatically
!  * closed at abort. These are intended for short operations like reading a
!  * configuration file, and there is a fixed limit on the number of files that
   * can be opened using these functions at any one time.
   *
   * Finally, BasicOpenFile is just a thin wrapper around open() that can
--- 43,50 
   * wrappers around fopen(3), opendir(3), popen(3) and open(2), respectively.
   * They behave like the corresponding native functions, except that the handle
   * is registered with the current subtransaction, and will be automatically
!  * closed at abort. These are intended mainly for short operations like
!  * reading a configuration file; there is a limit on the number of files that
   * can be opened using these functions at any one time.
   *
   * Finally, BasicOpenFile is just a thin wrapper around open() that can
*** static uint64 temporary_files_size = 0;
*** 198,210 
  /*
   * List of OS handles opened with AllocateFile, AllocateDir and
   * OpenTransientFile.
-  *
-  * Since we don't want to encourage heavy use of those functions,
-  * it seems OK to put a pretty small maximum limit on the number of
-  * simultaneously allocated descs.
   */
- #define MAX_ALLOCATED_DESCS  32
- 
  typedef enum
  {
  	AllocateDescFile,
--- 198,204 
*** typedef enum
*** 216,232 
  typedef struct
  {
  	AllocateDescKind kind;
  	union
  	{
  		FILE	   *file;
  		DIR		   *dir;
  		int			fd;
  	}			desc;
- 	SubTransactionId create_subid;
  } AllocateDesc;
  
  static int	numAllocatedDescs = 0;
! static AllocateDesc allocatedDescs[MAX_ALLOCATED_DESCS];
  
  /*
   * Number of temporary files opened during the current session;
--- 210,227 
  typedef struct
  {
  	AllocateDescKind kind;
+ 	SubTransactionId create_subid;
  	union
  	{
  		FILE	   *file;
  		DIR		   *dir;
  		int			fd;
  	}			desc;
  } AllocateDesc;
  
  static int	numAllocatedDescs = 0;
! static int	maxAllocatedDescs = 0;
! static AllocateDesc *allocatedDescs = NULL;
  
  /*
   * Number of temporary files opened during the current session;
*** static void FreeVfd(File file);
*** 284,289 
--- 279,286 
  
  static int	FileAccess(File file);
  static File OpenTemporaryFileInTablespace(Oid tblspcOid, bool rejectError);
+ static bool reserveAllocatedDesc(void);
+ static int	FreeDesc(AllocateDesc *desc);
  static void AtProcExit_Files(int code, Datum arg);
  static void CleanupTempFiles(bool isProcExit);
  static void RemovePgTempFilesInDir(const char *tmpdirname);
*** FilePathName(File file)
*** 1491,1496 
--- 1488,1553 
  
  
  /*
+  * Create/enlarge the allocatedDescs[] array if needed and possible.
+  * Returns true if an 

Re: [HACKERS] system catalog pg_rewrite column ev_attr document description problem

2013-06-08 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 If we were a little earlier in the release cycle I would argue that
 if we're going to do anything with this column we should drop it.

 Which is exactly what I think we should do as soon as we branch.

If we're going to do that, there doesn't seem to me to be a lot of point
in adjusting the column's contents in 9.3 --- the argument for doing
that was mostly forward compatibility with some future actual usage of
the column.  I'd be more inclined to leave HEAD as-is and then do the
column-ectomy along with this removal of dead code in 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake


On 06/07/2013 12:14 PM, Josh Berkus wrote:


Right now, what we're telling users is You can have continuous backup
with Postgres, but you'd better hire and expensive consultant to set it
up for you, or use this external tool of dubious provenance which
there's no packages for, or you might accidentally cause your database
to shut down in the middle of the night.


This is an outright falsehood. We are telling them, You better know 
what you are doing or You should call a consultant. This is no 
different than, You better know what you are doing or You should take 
driving lessons.




At which point most sensible users say no thanks, I'll use something else.



Josh I have always admired your flair for dramatics, it almost rivals 
mine. Users are free to use what they want, some will chose lesser 
databases. I am ok with that because eventually if PostgreSQL is the 
right tool, they will come back to us, and PgExperts or CMD or OmniTI or 
they will know what they are doing and thus don't need us.


JD


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake


On 06/08/2013 07:36 AM, MauMau wrote:


1. If the machine or postgres crashes while archive_command is copying a
WAL file, later archive recovery fails.
This is because cp leaves a file of less than 16MB in archive area, and
postgres refuses to start when it finds such a small archive WAL file.
The solution, which IIRC Tomas san told me here, is to do like cp %p
/archive/dir/%f.tmp  mv /archive/dir/%f.tmp /archive/dir/%f.


Well it seems to me that one of the problems here is we tell people to 
use copy. We should be telling people to use a command (or supply a 
command) that is smarter than that.



3. You cannot know the reason of archive_command failure (e.g. archive
area full) if you don't use PostgreSQL's server logging.
This is because archive_command failure is not logged in syslog/eventlog.


Wait, what? Is this true (someone else?)

JD





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake


On 06/06/2013 07:52 AM, Heikki Linnakangas wrote:

I think it can be made fairly robust otherwise, and the performance
impact should be pretty easy to measure with e.g pgbench.


Once upon a time in a land far, far away, we expected users to manage 
their own systems. We had things like soft and hard quotas on disks and 
last log to find out who was logging into the system. Alas, as far as I 
know soft and hard quotas are kind of a thing of the past but that 
doesn't mean that their usefulness has ended.


The idea that we PANIC is not just awful, it is stupid. I don't think 
anyone is going to disagree with that. However, there is a question of 
what to do instead. I think the idea of sprinkling checks into the 
higher level code before specific operations is not invalid but I also 
don't think it is necessary.


To me, a more pragmatic approach makes sense. Obviously having some kind 
of code that checks the space makes sense but I don't know that it needs 
to be around any operation other than we are creating a segment. What do 
we care why the segment is being created? If we don't have enough room 
to create the segment, the transaction rollsback with some OBVIOUS not 
OBTUSE error.


Obviously this could cause a ton of transactions to roll back but I 
think keeping the database consistent and rolling back a transaction in 
case of error is exactly what we are supposed to do.


Sincerely,

Joshua D. Drake





- Heikki






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Andres Freund
On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote:
 To me, a more pragmatic approach makes sense. Obviously having some kind of
 code that checks the space makes sense but I don't know that it needs to be
 around any operation other than we are creating a segment. What do we care
 why the segment is being created? If we don't have enough room to create the
 segment, the transaction rollsback with some OBVIOUS not OBTUSE error.
 
 Obviously this could cause a ton of transactions to roll back but I think
 keeping the database consistent and rolling back a transaction in case of
 error is exactly what we are supposed to do.

You know, the PANIC isn't there just because we like to piss of
users. There's actual technical reasons that don't just go away by
judging the PANIC as stupid.
At the points where the XLogInsert()s happens we're in critical sections
out of which we *cannot* ERROR out because we already may have made
modifications that cannot be allowed to be performed
partially/unlogged. That's why we're throwing a PANIC which will force a
cluster wide restart including *NOT* writing any further buffers from
s_b out.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Andres Freund
On 2013-06-07 12:02:57 +0300, Heikki Linnakangas wrote:
 On 07.06.2013 00:38, Andres Freund wrote:
 On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote:
 * Heikki Linnakangas wrote:
 
 The current situation is that if you run out of disk space while writing
 WAL, you get a PANIC, and the server shuts down. That's awful. We can
 
 So we need to somehow stop new WAL insertions from happening, before
 it's too late.
 
 A naive idea is to check if there's enough preallocated WAL space, just
 before inserting the WAL record. However, it's too late to check that in
 
 There is a database engine, Microsoft's Jet Blue aka the Extensible
 Storage Engine, that just keeps some preallocated log files around,
 specifically so it can get consistent and halt cleanly if it runs out of
 disk space.
 
 In other words, the idea is not to check over and over again that there is
 enough already-reserved WAL space, but to make sure there always is by
 having a preallocated segment that is never used outside a disk space
 emergency.
 
 That's not a bad technique. I wonder how reliable it would be in
 postgres.
 
 That's no different from just having a bit more WAL space in the first
 place. We need a mechanism to stop backends from writing WAL, before you run
 out of it completely. It doesn't matter if the reservation is done by
 stashing away a WAL segment for emergency use, or by a variable in shared
 memory. Either way, backends need to stop using it up, by blocking or
 throwing an error before they enter the critical section.

Well, if you have 16 or 32MB of reserved WAL space available you don't
need to judge all that precisely how much space is available.

So we can just sprinkle some EnsureXLogHasSpace() on XLogInsert()
callsites like heap_insert(), but we can do that outside of the critical
sections and we can do it without locks since there needs to happen
quite some write activity to overrun the reserved space. Anything that
desparately needs to write stuff, like the end of recovery checkpoint,
can just not call EnsureXLogHasSpace() and rely on the reserved space.

Seems like 90% of the solution for 30% of the complexity or so.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] system catalog pg_rewrite column ev_attr document description problem

2013-06-08 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:

  If we were a little earlier in the release cycle I would argue that
  if we're going to do anything with this column we should drop it.
 
  Which is exactly what I think we should do as soon as we branch.
 
 If we're going to do that, there doesn't seem to me to be a lot of point
 in adjusting the column's contents in 9.3 --- the argument for doing
 that was mostly forward compatibility with some future actual usage of
 the column.  I'd be more inclined to leave HEAD as-is and then do the
 column-ectomy along with this removal of dead code in 9.4.

ok


--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bad error message on valuntil

2013-06-08 Thread Joshua D. Drake


On 06/07/2013 12:31 PM, Tom Lane wrote:

Joshua D. Drake j...@commandprompt.com writes:

On 06/07/2013 11:57 AM, Tom Lane wrote:

I think it's intentional that we don't tell the *client* that level of
detail.



Why? That seems rather silly.


The general policy on authentication failure reports is that we don't
tell the client anything it doesn't know already about what the auth
method is.  We can log additional info into the postmaster log if it


I was looking at the code and I saw this catchall:

 default:
errstr = gettext_noop(authentication failed 
for user \%s\: invalid authentication method);

break;

I think we could make the argument that if valuntil is expired that the 
authentication method is invalid. Thoughts?


Else I am trying to come up with some decent wording... something like:

Authentication failed: not all authentication tokens were met

?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote:


  The archive command can be made a shell script (or that matter a
  compiled program) which can do anything it wants upon failure, including
  emailing people.

 You're talking about using external tools -- frequently hackish,
 workaround ones -- to handle something which PostgreSQL should be doing
 itself, and which only the database engine has full knowledge of.


I think the database engine is about the last thing which would have full
knowledge of the best way to contact the DBA, especially during events
which by definition mean things are already going badly.  I certainly don't
see having core code which knows how to talk to every PBX, SMS, email
system, or twitter feed that anyone might wish to use for logging.
PostgreSQL already supports two formats of text logs, plus syslog and
eventlog.  Is there some additional logging management tool that we could
support which is widely used, doesn't require an expensive consultant to
set-up and configure correctly (or even to decide what correctly means
for the given situation), and which solves 80% of the problems?

It would be nice to have the ability to specify multiple log destinations
with different log_min_messages for each one.  I'm sure syslog already must
implement some kind of method for doing that, but I've been happy enough
with the text logs that I've never bothered to look into it much.


 While
 that's the only solution we have for now, it's hardly a worthy design goal.

 Right now, what we're telling users is You can have continuous backup
 with Postgres, but you'd better hire and expensive consultant to set it
 up for you, or use this external tool of dubious provenance which
 there's no packages for, or you might accidentally cause your database
 to shut down in the middle of the night.

 At which point most sensible users say no thanks, I'll use something
 else.


What does the something else do?  Hopefully it is not silently invalidate
your backups.

Cheers,

Jeff


Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS

2013-06-08 Thread Cédric Villemain
Le samedi 8 juin 2013 19:06:58, Tom Lane a écrit :
 Recently we had a gripe about how you can't read very many files
 concurrently with contrib/file_fdw:
 http://www.postgresql.org/message-id/OF419B5767.8A3C9ADB-ON85257B79.005491E
 9-85257b79.0054f...@isn.rtss.qc.ca
 
 The reason for this is that file_fdw goes through the COPY code, which
 uses AllocateFile(), which has a wired-in assumption that not very many
 files need to be open concurrently.  I thought for a bit about trying to
 get COPY to use a virtual file descriptor such as is provided by the
 rest of fd.c, but that didn't look too easy, and in any case probably
 nobody would be excited about adding additional overhead to the COPY
 code path.  What seems more practical is to relax the hard-coded limit
 on the number of files concurrently open through AllocateFile().  Now,
 we could certainly turn that array into some open-ended list structure,
 but that still wouldn't let us have an arbitrary number of files open,
 because at some point we'd run into platform-specific EMFILES or ENFILES
 limits on the number of open file descriptors.  In practice we want to
 keep it under the max_safe_fds limit that fd.c goes to a lot of trouble
 to determine.  So it seems like the most useful compromise is to keep
 the allocatedDescs[] array data structure as-is, but allow its size to
 depend on max_safe_fds.  In the attached proposed patch I limit it to
 max_safe_fds / 2, so that there's still a reasonable number of FDs
 available for fd.c's main pool of virtual FDs.  On typical modern
 platforms that should be at least a couple of hundred, thus making the
 effective limit about an order of magnitude more than it is currently
 (32).
 
 Barring objections or better ideas, I'd like to back-patch this as far
 as 9.1 where file_fdw was introduced.

I just wonder about this statement that you removed: 
  * Since we don't want to encourage heavy use of those functions,
  * it seems OK to put a pretty small maximum limit on the number of
  * simultaneously allocated descs.

Is it now encouraged to use those functions, or at least that it seems less 
'scary' than in the past ?
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS

2013-06-08 Thread Tom Lane
=?iso-8859-15?q?C=E9dric_Villemain?= ced...@2ndquadrant.com writes:
 I just wonder about this statement that you removed: 
   * Since we don't want to encourage heavy use of those functions,
   * it seems OK to put a pretty small maximum limit on the number of
   * simultaneously allocated descs.

 Is it now encouraged to use those functions, or at least that it seems less 
 'scary' than in the past ?

Well, we could put back some weaker form of the statement, but I wasn't
sure how to word it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:15 AM, Joshua D. Drake j...@commandprompt.comwrote:


 On 06/06/2013 07:52 AM, Heikki Linnakangas wrote:

 I think it can be made fairly robust otherwise, and the performance
 impact should be pretty easy to measure with e.g pgbench.


 Once upon a time in a land far, far away, we expected users to manage
 their own systems. We had things like soft and hard quotas on disks and
 last log to find out who was logging into the system. Alas, as far as I
 know soft and hard quotas are kind of a thing of the past but that doesn't
 mean that their usefulness has ended.

 The idea that we PANIC is not just awful, it is stupid. I don't think
 anyone is going to disagree with that. However, there is a question of what
 to do instead. I think the idea of sprinkling checks into the higher level
 code before specific operations is not invalid but I also don't think it is
 necessary.


Given that the system is going to become unusable, I don't see why PANIC is
an awful, stupid way of doing it.  And if it can only be used for things
that don't generate WAL, that is pretty much unusable, as even read only
transactions often need to do clean-up tasks that generate WAL.

Cheers,

Jeff


Re: [HACKERS] [PATCH] pgbench --throttle (submission 7 - with lag measurement)

2013-06-08 Thread Greg Smith

On 6/1/13 5:00 AM, Fabien COELHO wrote:

Question 1: should it report the maximum lang encountered?


I haven't found the lag measurement to be very useful yet, outside of 
debugging the feature itself.  Accordingly I don't see a reason to add 
even more statistics about the number outside of testing the code.  I'm 
seeing some weird lag problems that this will be useful for though right 
now, more on that a few places below.



Question 2: the next step would be to have the current lag shown under
option --progress, but that would mean having a combined --throttle
--progress patch submission, or maybe dependencies between patches.


This is getting too far ahead.  Let's get the throttle part nailed down 
before introducing even more moving parts into this.  I've attached an 
updated patch that changes a few things around already.  I'm not done 
with this yet and it needs some more review before commit, but it's not 
too far away from being ready.


This feature works quite well.  On a system that will run at 25K TPS 
without any limit, I did a run with 25 clients and a rate of 400/second, 
aiming at 10,000 TPS, and that's what I got:


number of clients: 25
number of threads: 1
duration: 60 s
number of transactions actually processed: 599620
average transaction lag: 0.307 ms
tps = 9954.779317 (including connections establishing)
tps = 9964.947522 (excluding connections establishing)

I never thought of implementing the throttle like this before, but it 
seems to work out well so far.  Check out tps.png to see the smoothness 
of the TPS curve (the graphs came out of pgbench-tools.  There's a 
little more play outside of the target than ideal for this case.  Maybe 
it's worth tightening the Poisson curve a bit around its center?


The main implementation issue I haven't looked into yet is why things 
can get weird at the end of the run.  See the latency.png graph attached 
and you can see what I mean.


I didn't like the naming on this option or all of the ways you could 
specify the delay.  None of those really added anything, since you can 
get every other behavior by specifying a non-integer TPS.  And using the 
word throttle inside the code is fine, but I didn't like exposing that 
implementation detail more than it had to be.


What I did instead was think of this as a transaction rate target, which 
makes the help a whole lot simpler:


  -R SPEC, --rate SPEC
   target rate per client in transactions per second

Made the documentation easier to write too.  I'm not quite done with 
that yet, the docs wording in this updated patch could still be better.


I personally would like this better if --rate specified a *total* rate 
across all clients.  However, there are examples of both types of 
settings in the program already, so there's no one precedent for which 
is right here.  -t is per-client and now -R is too; I'd prefer it to be 
like -T instead.  It's not that important though, and the code is 
cleaner as it's written right now.  Maybe this is better; I'm not sure.


I did some basic error handling checks on this and they seemed good, the 
program rejects target rates of =0.


On the topic of this weird latency spike issue, I did see that show up 
in some of the results too.  Here's one where I tried to specify a rate 
higher than the system can actually handle, 8 TPS total on a 
SELECT-only test


$ pgbench -S -T 30 -c 8 -j 4 -R1tps pgbench
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
query mode: simple
number of clients: 8
number of threads: 4
duration: 30 s
number of transactions actually processed: 761779
average transaction lag: 10298.380 ms
tps = 25392.312544 (including connections establishing)
tps = 25397.294583 (excluding connections establishing)

It was actually limited by the capabilities of the hardware, 25K TPS. 
10298 ms of lag per transaction can't be right though.


Some general patch submission suggestions for you as a new contributor:

-When re-submitting something with improvements, it's a good idea to add 
a version number to the patch so reviewers can tell them apart easily. 
But there is no reason to change the subject line of the e-mail each 
time.  I followed that standard here.  If you updated this again I would 
name the file pgbench-throttle-v9.patch but keep the same e-mail subject.


-There were some extra carriage return characters in your last 
submission.  Wasn't a problem this time, but if you can get rid of those 
that makes for a better patch.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
attachment: tps.pngattachment: latency.pngdiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 8c202bf..799dfcd 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ intunlogged_tables = 0;
 double sample_rate = 0.0;
 
 /*
+ * When clients are 

Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS

2013-06-08 Thread Cédric Villemain
  I just wonder about this statement that you removed: 
* Since we don't want to encourage heavy use of those functions,
* it seems OK to put a pretty small maximum limit on the number of
* simultaneously allocated descs.
 
  Is it now encouraged to use those functions, or at least that it seems less 
  'scary' than in the past ?
 
 Well, we could put back some weaker form of the statement, but I wasn't
 sure how to word it.

I'm not sure of the 'expected' problems...
The only thing I can think of at the moment is the time spent to close 
file descriptors on abort/commit.
I'm not sure of expected value of max_safe_fds. Your patch now initialize 
with 5 slots instead of 10, if max_safe_fds is large maybe it is better to 
double the size each time we need instead of jumping directly to the largest 
size ?

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


Re: [HACKERS] Redesigning checkpoint_segments

2013-06-08 Thread Greg Smith

On 6/7/13 2:43 PM, Robert Haas wrote:

name.  What I would like to see is a single number here in memory units that
replaces both checkpoint_segments and wal_keep_segments.


This isn't really making sense to me.  I don't think we should assume
that someone who wants to keep WAL around for replication also wants
to wait longer between checkpoints.  Those are two quite different
things.


It's been years since I saw anyone actually using checkpoint_segments as 
that sort of limit.  I see a lot of sites pushing the segments limit up 
and then using checkpoint_timeout carefully.  It's pretty natural to say 
I don't want to go more than X minutes between checkpoints.  The case 
for wanting to say I don't want to go more than X MB between 
checkpoints instead, motivated by not wanting too much activity to 
queue between them, I'm just not seeing demand for that now.


The main reason I do see people paying attention to checkpoint_segments 
still is to try and put a useful bound on WAL disk space usage.  That's 
the use case I think overlaps with wal_keep_segments such that you might 
replace both of them.  I think we really only need one control that 
limits how much WAL space is expected inside of pg_xlog, and it should 
be easy and obvious how to set it.


The more I look at this checkpoint_segments patch, the more I wonder why 
it's worth even bothering with anything but a disk space control here. 
checkpoint_segments is turning into an internal implementation detail 
most sites I see wouldn't miss at all.  Rather than put work into 
autotuning it, I'd be happy to eliminate checkpoint_segments altogther, 
in favor of a WAL disk space limit.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Jeff Janes
On Thu, Jun 6, 2013 at 2:27 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-06-06 12:34:01 -0700, Jeff Janes wrote:
  On Fri, May 24, 2013 at 11:51 AM, Greg Smith g...@2ndquadrant.com
 wrote:
 
   On 5/24/13 9:21 AM, Robert Haas wrote:
  
But I wonder if we wouldn't be better off coming up with a little more
   user-friendly API.  Instead of exposing a cost delay, a cost limit,
   and various charges, perhaps we should just provide limits measured in
   KB/s, like dirty_rate_limit = amount of data you can dirty per
   second, in kB and read_rate_limit = amount of data you can read into
   shared buffers per second, in kB.
  
  
   I already made and lost the argument for doing vacuum in KB/s units,
 so I
   wasn't planning on putting that in the way of this one.
 
 
  I think the problem is that making that change would force people to
  relearn something that was already long established, and it was far from
  clear that the improvement, though real, was big enough to justify
 forcing
  people to do that.

 I don't find that argument very convincing. Since you basically can
 translate the current variables into something like the above variables
 with some squinting we sure could have come up with some way to keep the
 old definition and automatically set the new GUCs and the other way
 round.



That may be, but it was not what the patch that was submitted did.  And I
don't think the author or the reviewers were eager to put in the effort to
make that change, which would surely be quite a bit more work than the
original patch was in the first place.  Also, I'm not sure that such a
complexity would even be welcomed.  It sounds like an ongoing maintenance
cost, and I'm sure the word baroque would get thrown around.

Anyway, I don't think that resistance to making user visible changes to old
features should inhibit us from incorporating lessons from them into new
features.


 guc.c should even have enough information to prohibit setting
 both in the config file...


Is there precedence/infrastructure for things like that?  I could see uses
for mutually exclusive complexes of configuration variables, but I wouldn't
even know where to start in implementing such.

Cheers,

Jeff


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Joshua D. Drake


On 06/08/2013 11:27 AM, Andres Freund wrote:


On 2013-06-08 11:15:40 -0700, Joshua D. Drake wrote:

To me, a more pragmatic approach makes sense. Obviously having some kind of
code that checks the space makes sense but I don't know that it needs to be
around any operation other than we are creating a segment. What do we care
why the segment is being created? If we don't have enough room to create the
segment, the transaction rollsback with some OBVIOUS not OBTUSE error.

Obviously this could cause a ton of transactions to roll back but I think
keeping the database consistent and rolling back a transaction in case of
error is exactly what we are supposed to do.


You know, the PANIC isn't there just because we like to piss of
users. There's actual technical reasons that don't just go away by
judging the PANIC as stupid.


Yes I know we aren't trying to piss off users. What I am saying is that 
it is stupid to the user that it PANICS. I apologize if that came out wrong.



At the points where the XLogInsert()s happens we're in critical sections
out of which we *cannot* ERROR out because we already may have made
modifications that cannot be allowed to be performed
partially/unlogged. That's why we're throwing a PANIC which will force a
cluster wide restart including *NOT* writing any further buffers from
s_b out.



Does this preclude (sorry I don't know this part of the code very well) 
my suggestion of on log create?


JD



Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] small patch to crypt.c

2013-06-08 Thread Joshua D. Drake


Hello,

In my quest to understand how all the logging etc works with 
authentication I came across the area of crypt.c that checks for 
valid_until but it seems like it has an extraneous check.


If I am wrong I apologize for the noise but wouldn't mind an explanation.

index f01d904..8d809b2 100644
--- a/src/backend/libpq/crypt.c
+++ b/src/backend/libpq/crypt.c
@@ -145,9 +145,7 @@ md5_crypt_verify(const Port *port, const char *role, 
char *client_pass)

/*
 * Password OK, now check to be sure we are not past 
rolvaliduntil

 */
-   if (isnull)
-   retval = STATUS_OK;
-   else if (vuntil  GetCurrentTimestamp())
+   if (vuntil  GetCurrentTimestamp())
retval = STATUS_ERROR;
else
retval = STATUS_OK;


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimising Foreign Key checks

2013-06-08 Thread Simon Riggs
On 8 June 2013 15:30, Noah Misch n...@leadboat.com wrote:
 On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote:
  On Sun, Jun 02, 2013 at 10:45:21AM +0100, Simon Riggs wrote:
  For clarity the 4 problems are
  1. SQL execution overhead
  2. Memory usage
  3. Memory scrolling
  4. Locking overhead, specifically FPWs and WAL records from FK checks
  probably in that order or thereabouts.

 Lets rethink things to put a few options on the table and see what we get...

 2. Don't store FK events in the after trigger queue at all, but apply
 them as we go. That solves problems2 and 3. That could be considered
 to be in violation of the SQL standard, which requires us to apply the
 checks at the end of the statement. We already violate the standard
 with regard to uniqueness checks, so doing it here doesn't seem
 unacceptable.

 I wouldn't like to see that compliance bug propagate to other constraint
 types.  What clauses in the standard demand end-of-statement timing, anyway?

 What if we followed the example of deferred UNIQUE: attempt FK checks as we go
 and enqueue an after-trigger recheck when such an initial test fails?

The copy I have access to (2008 draft), 4.17.2 Checking of constraints

The checking of a constraint depends on its constraint mode within
the current SQL-transaction. Whenever an SQL-statement is executed,
every constraint whose mode is immediate is checked, at a certain
point after any changes to SQL-data and schemas resulting from that
execution have been effected, to see if it is satisfied. A constraint
is satisfied if and only if the applicable search condition included
in its descriptor evaluates to True or Unknown. If any constraint is
not satisfied, then any changes to SQL-data or schemas resulting from
executing that statement are canceled. (See the General Rules of
Subclause 13.5, “SQL procedure statement”.

NOTE 31 — This includes SQL-statements that are executed as a direct
result or an indirect result of executing a different SQL-statement.
It also includes statements whose effects explicitly or implicitly
include setting the constraint mode to immediate. 


I can't see anything there that stops me applying locks as we go, but
I feel like someone will object...

This point seems crucial to the particular approach we take, so I need
wider input.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Jeff Janes
On Thu, Jun 6, 2013 at 1:02 PM, Robert Haas robertmh...@gmail.com wrote:


 If we can see our way clear to ripping out the autovacuum costing
 stuff and replacing them with a read rate limit and a dirty rate
 limit, I'd be in favor of that.  The current system limits the linear
 combination of those with user-specified coefficients, which is more
 powerful but less intuitive.  If we need that, we'll have to keep it
 the way it is, but I'm hoping we don't.



I don't know what two independent setting would look like.  Say you keep
two independent counters, where each can trigger a sleep, and the
triggering of that sleep clears only its own counter.  Now you still have a
limit on the linear combination, it is just that summation has moved to a
different location.  You have two independent streams of sleeps, but they
add up to the same amount of sleeping as a single stream based on a summed
counter.

Or if one sleep clears both counters (the one that triggered it and the
other one), I don't think that that is what I would call independent
either.  Or at least not if it has no memory.  The intuitive meaning of
independent would require that it keep track of which of the two counters
was controlling over the last few seconds.  Am I overthinking this?

Also, in all the anecdotes I've been hearing about autovacuum causing
problems from too much IO, in which people can identify the specific
problem, it has always been the write pressure, not the read, that caused
the problem.  Should the default be to have the read limit be inactive and
rely on the dirty-limit to do the throttling?

Cheers,

Jeff


[HACKERS] ALTER TABLE ... ALTER CONSTRAINT

2013-06-08 Thread Simon Riggs
While fiddling with FK tuning, it was useful to be able to enable and
disable the DEFERRED mode of constraints.

That is not currently possible in SQL, so I wrote this patch. Without
this you have to drop and then re-add a constraint, which is
impractical for large tables.

e.g.
CREATE TABLE fktable (id integer, fk integer REFERENCES pktable (id));

ALTER TABLE foo
   ALTER CONSTRAINT fktable_fk_fkey DEFERRED INITIALLY IMMEDIATE;

Includes docs and tests.

Currently works for FKs only. Potentially other constraints can be
supported in future.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


alter_table_alter_constraint.v1.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Greg Smith

On 6/8/13 4:43 PM, Jeff Janes wrote:


Also, in all the anecdotes I've been hearing about autovacuum causing
problems from too much IO, in which people can identify the specific
problem, it has always been the write pressure, not the read, that
caused the problem.  Should the default be to have the read limit be
inactive and rely on the dirty-limit to do the throttling?


That would be bad, I have to carefully constrain both of them on systems 
that are short on I/O throughput.  There all sorts of cases where 
cleanup of a large and badly cached relation will hit the read limit 
right now.


I suspect the reason we don't see as many complaints is that a lot more 
systems can handle 7.8MB/s of random reads then there are ones that can 
do 3.9MB/s of random writes.  If we removed that read limit, a lot more 
complaints would start rolling in about the read side.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Batch API for After Triggers

2013-06-08 Thread Simon Riggs
While fiddling with FK tuning, Noah suggested batching trigger
executions together to avoid execution overhead.

It turns out there is no easy way to write triggers that can take
advantage of the knowledge that they are being executed as a set of
trigger executions. Some API is required to allow a trigger to
understand that there may be other related trigger executions in the
very near future, so it can attempt to amortise call overhead across
many invocations (batching).

The attached patch adds two fields to the TriggerDesc trigger
functions are handed, allowing them to inspect (if they choose) the
additional fields and thus potentially use some form of batching.

This is backwards compatible with earlier trigger API.

Two fields are

int   tg_tot_num_events;
int   tg_event_num

So your trigger can work out it is e.g. number 3 of 56 invocations in
the current set of after triggers.

Going back to Noah's example, this would allow you to collect all 56
values and then execute a single statement with an array of 56 values
in it. Knowing there are 56 means you can wait until the 56th
invocation before executing the batched statement, without risking
skipping some checks because you've only got half a batch left.

If you don't do this, then you'd need to introduce the concept of a
final function similar to the way aggregates work. But that seems
much too complex to be real world useful.

This seemed a generally useful approach for any after trigger author,
not just for RI.

Comments please.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


batch_api_after_triggers.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Simon Riggs
On 7 June 2013 10:02, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 07.06.2013 00:38, Andres Freund wrote:

 On 2013-06-06 23:28:19 +0200, Christian Ullrich wrote:

 * Heikki Linnakangas wrote:

 The current situation is that if you run out of disk space while writing
 WAL, you get a PANIC, and the server shuts down. That's awful. We can


 So we need to somehow stop new WAL insertions from happening, before
 it's too late.


 A naive idea is to check if there's enough preallocated WAL space, just
 before inserting the WAL record. However, it's too late to check that in


 There is a database engine, Microsoft's Jet Blue aka the Extensible
 Storage Engine, that just keeps some preallocated log files around,
 specifically so it can get consistent and halt cleanly if it runs out of
 disk space.

 In other words, the idea is not to check over and over again that there
 is
 enough already-reserved WAL space, but to make sure there always is by
 having a preallocated segment that is never used outside a disk space
 emergency.


 That's not a bad technique. I wonder how reliable it would be in
 postgres.


 That's no different from just having a bit more WAL space in the first
 place. We need a mechanism to stop backends from writing WAL, before you run
 out of it completely. It doesn't matter if the reservation is done by
 stashing away a WAL segment for emergency use, or by a variable in shared
 memory. Either way, backends need to stop using it up, by blocking or
 throwing an error before they enter the critical section.

 I guess you could use the stashed away segment to ensure that you can
 recover after PANIC. At recovery, there are no other backends that could use
 up the emergency segment. But that's not much better than what we have now.

Christian's idea seems good to me. Looks like you could be dismissing
this too early, especially since there's no better idea emerged.

I doubt that we're going to think of something others didn't already
face. It's pretty clear that most other DBMS do this with their logs.

For your fast wal insert patch to work well, we need a simple and fast
technique to detect out of space.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 1:57 PM, Greg Smith g...@2ndquadrant.com wrote:

 On 6/8/13 4:43 PM, Jeff Janes wrote:

  Also, in all the anecdotes I've been hearing about autovacuum causing
 problems from too much IO, in which people can identify the specific
 problem, it has always been the write pressure, not the read, that
 caused the problem.  Should the default be to have the read limit be
 inactive and rely on the dirty-limit to do the throttling?


 That would be bad, I have to carefully constrain both of them on systems
 that are short on I/O throughput.  There all sorts of cases where cleanup
 of a large and badly cached relation will hit the read limit right now.


I wouldn't remove the ability, just change the default.  You can still tune
your exquisitely balanced systems :)

Of course if the default were to be changed, who knows what complaints we
would start getting, which we don't get now because the current default
prevents them.

But my gut feeling is that if autovacuum is trying to read faster than the
hardware will support, it will just automatically get throttled, by
inherent IO waits, at a level which can be comfortably supported.  And this
will cause minimal interference with other processes.  It is self-limiting.
 If it tries to write too much, however, the IO system is reduced to a
quivering heap, not just for that process, but for all others as well.




 I suspect the reason we don't see as many complaints is that a lot more
 systems can handle 7.8MB/s of random reads then there are ones that can do
 3.9MB/s of random writes.  If we removed that read limit, a lot more
 complaints would start rolling in about the read side.


Why is there so much random IO?  Do your systems have
autovacuum_vacuum_scale_factor set far below the default?  Unless they do,
most of the IO (both read and write) should be sequential.  Or at least, I
don't understand why they are not sequential.

Cheers,

Jeff


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:

 I suspect the reason we don't see as many complaints is that a
 lot more systems can handle 7.8MB/s of random reads then there
 are ones that can do 3.9MB/s of random writes.  If we removed
 that read limit, a lot more complaints would start rolling in
 about the read side.

I'll believe that all of that is true, but I think there's another
reason.  With multiple layers of write cache (PostgreSQL
shared_buffers, OS cache, controller or SSD cache) I think there's
a tendency for an avalanche effect.  Dirty pages stick to cache
at each level like snow on the side of a mountain, accumulating
over time.  When it finally breaks loose at the top, it causes more
from lower levels to break free as it passes.  The result at the
bottom can be devastating.

Before I leave the metaphor, I will admit that I've sometimes done
the equivalent of setting off an occasional stick of dynamite to
cause things to cascade down before they have built up to a more
dangerous level.

Setting aside the colorful imagery, with a write cache you often
see *very* fast writes for bursts or even sustained writes up to a
certain point, after which you suddenly have serious latency
spikes.  Reads tend to degrade more gracefully, giving you a sense
that you're starting to get into trouble while you still have time
to react to prevent extreme conditions.  At least that has been my
experience.  I think the sudden onset of problems from write
saturation contributes to the complaints.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Batch API for After Triggers

2013-06-08 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:

 Comments please.

How much of this problem space do you think could be addressed by
providing OLD and NEW *relations* to AFTER EACH STATEMENT triggers?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS

2013-06-08 Thread Tom Lane
=?iso-8859-1?q?C=E9dric_Villemain?= ced...@2ndquadrant.com writes:
 I'm not sure of expected value of max_safe_fds. Your patch now initialize 
 with 5 slots instead of 10, if max_safe_fds is large maybe it is better to 
 double the size each time we need instead of jumping directly to the largest 
 size ?

I don't see the point particularly.  At the default value of
max_files_per_process (1000), the patch can allocate at most 500 array
elements, which on a 64-bit machine would probably be 16 bytes each
or 8KB total.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimising Foreign Key checks

2013-06-08 Thread Noah Misch
On Sat, Jun 08, 2013 at 09:39:14PM +0100, Simon Riggs wrote:
 On 8 June 2013 15:30, Noah Misch n...@leadboat.com wrote:
  On Tue, Jun 04, 2013 at 02:45:17PM +0100, Simon Riggs wrote:

  2. Don't store FK events in the after trigger queue at all, but apply
  them as we go. That solves problems2 and 3. That could be considered
  to be in violation of the SQL standard, which requires us to apply the
  checks at the end of the statement. We already violate the standard
  with regard to uniqueness checks, so doing it here doesn't seem
  unacceptable.
 
  I wouldn't like to see that compliance bug propagate to other constraint
  types.  What clauses in the standard demand end-of-statement timing, anyway?
 
  What if we followed the example of deferred UNIQUE: attempt FK checks as we 
  go
  and enqueue an after-trigger recheck when such an initial test fails?
 
 The copy I have access to (2008 draft), 4.17.2 Checking of constraints
 
 The checking of a constraint depends on its constraint mode within
 the current SQL-transaction. Whenever an SQL-statement is executed,
 every constraint whose mode is immediate is checked, at a certain
 point after any changes to SQL-data and schemas resulting from that
 execution have been effected, to see if it is satisfied. A constraint
 is satisfied if and only if the applicable search condition included
 in its descriptor evaluates to True or Unknown. If any constraint is
 not satisfied, then any changes to SQL-data or schemas resulting from
 executing that statement are canceled. (See the General Rules of
 Subclause 13.5, “SQL procedure statement”.
 
 NOTE 31 — This includes SQL-statements that are executed as a direct
 result or an indirect result of executing a different SQL-statement.
 It also includes statements whose effects explicitly or implicitly
 include setting the constraint mode to immediate. 

This does appear to specify FK timing semantics like PostgreSQL gives today.
Namely, it does not permit a FK-induced error when later actions of the query
that prompted the check could possibly remedy the violation.

 I can't see anything there that stops me applying locks as we go, but

Likewise; I don't see why we couldn't perform an optimistic check ASAP and
schedule a final after-statement check when an early check fails.  That
changes performance characteristics without changing semantics.

 I feel like someone will object...
 
 This point seems crucial to the particular approach we take, so I need
 wider input.

Agreed.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Greg Smith

On 6/8/13 5:17 PM, Jeff Janes wrote:


But my gut feeling is that if autovacuum is trying to read faster than
the hardware will support, it will just automatically get throttled, by
inherent IO waits, at a level which can be comfortably supported.  And
this will cause minimal interference with other processes.


If this were true all the time autovacuum tuning would be a lot easier. 
 You can easily make a whole server unresponsive by letting loose one 
rogue process doing a lot of reads.  Right now this isn't a problem for 
autovacuum because any one process running at 7.8MB/s is usually not a 
big deal.  It doesn't take too much in the way of read-ahead logic and 
throughput to satisfy that.  But I've seen people try and push the read 
rate upwards who didn't get very far beyond that before it was way too 
obtrusive.


I could collect some data from troubled servers to see how high I can 
push the read rate before they suffer.  Maybe there's a case there for 
increasing the default read rate because the write one is a good enough 
secondary limiter.  I'd be surprised if we could get away with more than 
a 2 or 3X increase though, and the idea of going unlimited is really 
scary.  It took me a year of before/after data collection before I was 
confident that it's OK to run unrestricted in all cache hit situations.



Why is there so much random IO?  Do your systems have
autovacuum_vacuum_scale_factor set far below the default?  Unless they
do, most of the IO (both read and write) should be sequential.


Insert one process doing sequential reads into a stream of other 
activity and you can easily get random I/O against the disks out of the 
mix.  You don't necessarily need the other activity to be random to get 
that.  N sequential readers eventually acts like N random readers for 
high enough values of N.  On busy servers, autovacuum is normally 
competing against multiple random I/O processes though.


Also, the database's theoretical model that block number correlates 
directly with location on disk can break down.  I haven't put a hard 
number to measuring it directly, but systems with vacuum problems seem 
more likely to have noticeable filesystem level fragmentation.  I've 
been thinking about collecting data from a few systems with filefrag to 
see if I'm right about that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Greg Smith

On 6/8/13 5:20 PM, Kevin Grittner wrote:

I'll believe that all of that is true, but I think there's another
reason.  With multiple layers of write cache (PostgreSQL
shared_buffers, OS cache, controller or SSD cache) I think there's
a tendency for an avalanche effect.  Dirty pages stick to cache
at each level like snow on the side of a mountain, accumulating
over time.  When it finally breaks loose at the top, it causes more
from lower levels to break free as it passes.


I explained this once as being like a tower of leaky buckets where each 
one drips into the one below.  Buckets draining out of the bottom at one 
rate, and new water comes in at another.  You can add water much faster 
than it drains, for a while.  But once one of the buckets fills you've 
got a serious mess.



I think the sudden onset of problems from write
saturation contributes to the complaints.


It's also important to realize that vacuum itself doesn't even do the 
writes in many cases.  If you have a large shared_buffers value, it 
wanders off making things dirty without any concern for what's going to 
disk.  When the next checkpoint shows up is when pressure increases at 
the top.


The way this discussion has wandered off has nicely confirmed I was 
right to try and avoid going into this area again :(


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Jeff Janes
On Sat, Jun 8, 2013 at 11:27 AM, Andres Freund and...@2ndquadrant.comwrote:


 You know, the PANIC isn't there just because we like to piss of
 users. There's actual technical reasons that don't just go away by
 judging the PANIC as stupid.
 At the points where the XLogInsert()s happens we're in critical sections
 out of which we *cannot* ERROR out because we already may have made
 modifications that cannot be allowed to be performed
 partially/unlogged. That's why we're throwing a PANIC which will force a
 cluster wide restart including *NOT* writing any further buffers from
 s_b out.


If archiving is on and failure is due to no space, could we just keep
trying XLogFileInit again for a couple minutes to give archiving a chance
to do its things?  Doing that while holding onto locks and a critical
section would be unfortunate, but if the alternative is a PANIC, it might
be acceptable.

The problem is that even if the file is only being kept so it can be
archived, once archiving succeeds I think the file is not removed
immediately but rather not until the next checkpoint, which will never
happen when the locks are still held.

Cheers,

Jeff


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau

From: Joshua D. Drake j...@commandprompt.com

On 06/08/2013 07:36 AM, MauMau wrote:

3. You cannot know the reason of archive_command failure (e.g. archive
area full) if you don't use PostgreSQL's server logging.
This is because archive_command failure is not logged in syslog/eventlog.


Wait, what? Is this true (someone else?)


I'm sorry, please let me correct myself.  What I meant is:

This is because the exact reason for archive_command failure (e.g. cp's 
output) is not logged in syslog/eventlog.  The fact itself that 
archive_command failed is recorded.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau

From: Joshua D. Drake j...@commandprompt.com

On 06/08/2013 11:27 AM, Andres Freund wrote:

You know, the PANIC isn't there just because we like to piss of
users. There's actual technical reasons that don't just go away by
judging the PANIC as stupid.


Yes I know we aren't trying to piss off users. What I am saying is that it 
is stupid to the user that it PANICS. I apologize if that came out wrong.


I've experienced PANIC shutdown several times due to WAL full during 
development.  As a user, one problem with PANIC is that it dumps core.  I 
think core files should be dumped only when can't happen events has 
occurred like PostgreSQL's bug.  I didn't expect postgres dumps core simply 
because disk is full.   I want postgres to shutdown with FATAL message in 
that exact case.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

2013-06-08 Thread Robert Haas
On Sat, Jun 8, 2013 at 10:25 AM, Andrew Dunstan and...@dunslane.net wrote:
 Don't downcase non-ascii identifier chars in multi-byte encodings.

 Long-standing code has called tolower() on identifier character bytes
 with the high bit set. This is clearly an error and produces junk output
 when the encoding is multi-byte. This patch therefore restricts this
 activity to cases where there is a character with the high bit set AND
 the encoding is single-byte.

 There have been numerous gripes about this, most recently from Martin
 Schäfer.

 Backpatch to all live releases.

I'm all for changing this, but back-patching seems like a terrible
idea.  This could easily break queries that are working now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimising Foreign Key checks

2013-06-08 Thread Robert Haas
On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch n...@leadboat.com wrote:
 This does appear to specify FK timing semantics like PostgreSQL gives today.
 Namely, it does not permit a FK-induced error when later actions of the query
 that prompted the check could possibly remedy the violation.

Yeah.  Standard or no standard, I think we'd have unhappy users if we
broke that.

 I can't see anything there that stops me applying locks as we go, but

Not sure I follow that bit but...

 Likewise; I don't see why we couldn't perform an optimistic check ASAP and
 schedule a final after-statement check when an early check fails.  That
 changes performance characteristics without changing semantics.

...this seems like it might have some promise; but what if the action
we're performing isn't idempotent?  And how do we know?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread MauMau

From: Josh Berkus j...@agliodbs.com

There's actually three potential failure cases here:

- One Volume: WAL is on the same volume as PGDATA, and that volume is
completely out of space.

- XLog Partition: WAL is on its own partition/volume, and fills it up.

- Archiving: archiving is failing or too slow, causing the disk to fill
up with waiting log segments.


I think there is one more case.  Is this correct?

- Failure of a disk containing data directory or tablespace
If checkpoint can't write buffers to disk because of disk failure, 
checkpoint cannot complete, thus WAL files accumulate in pg_xlog/.

This means that one disk failure will lead to postgres shutdown.



xLog Partition
--

As Heikki pointed, out, a full dedicated WAL drive is hard to fix once
it gets full, since there's nothing you can safely delete to clear
space, even enough for a checkpoint record.


This sounds very scary.  Is it possible to complete recovery and start up 
postmaster with either or both of the following modifications?


[Idea 1]
During recovery, force archiving a WAL file and delete/recycle it in 
pg_xlog/ as soon as its contents are applied.


[Idea 2]
During recovery, when disk full is encountered at end-of-recovery 
checkpoint, force archiving all unarchived WAL files and delete/recycle them 
at that time.



Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost limited statements RFC

2013-06-08 Thread Robert Haas
On Sat, Jun 8, 2013 at 4:43 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I don't know what two independent setting would look like.  Say you keep two
 independent counters, where each can trigger a sleep, and the triggering of
 that sleep clears only its own counter.  Now you still have a limit on the
 linear combination, it is just that summation has moved to a different
 location.  You have two independent streams of sleeps, but they add up to
 the same amount of sleeping as a single stream based on a summed counter.

 Or if one sleep clears both counters (the one that triggered it and the
 other one), I don't think that that is what I would call independent either.
 Or at least not if it has no memory.  The intuitive meaning of independent
 would require that it keep track of which of the two counters was
 controlling over the last few seconds.  Am I overthinking this?

Yep.  Suppose the user has a read limit of 64 MB/s and a dirty limit
of 4MB/s.  That means that, each second, we can read 8192 buffers and
dirty 512 buffers.  If we sleep for 20 ms (1/50th of a second), that
covers 163 buffer reads and 10 buffer writes, so we just reduce the
accumulate counters by those amounts (minimum zero).

 Also, in all the anecdotes I've been hearing about autovacuum causing
 problems from too much IO, in which people can identify the specific
 problem, it has always been the write pressure, not the read, that caused
 the problem.  Should the default be to have the read limit be inactive and
 rely on the dirty-limit to do the throttling?

The main time I think you're going to hit the read limit is during
anti-wraparound vacuums.  That problem may be gone in 9.4, if Heikki
writes that patch we were discussing just recently.  But at the
moment, we'll do periodic rescans of relations that are already
all-frozen, and that's potentially expensive.

So I'm not particularly skeptical about the need to throttle reads.  I
suspect many people don't need it, but there are probably some who do,
at least for anti-wraparound cases - especially on EC2, where the
limit on I/O is often the GigE card.  What I *am* skeptical about is
the notion that people need the precise value of the write limit to
depend on how many of the pages read are being found in shared_buffers
versus not.  That's essentially what the present system is
accomplishing - at a great cost in user-visible complexity.

Basically, I think that anti-wraparound vacuums may need either read
throttling or write throttling depending on whether the data is
already frozen; and regular vacuums probably only need
write-throttling.  But I have neither any firsthand experience nor any
empirical reason to presume that the write limit needs to be lower
when the read-rate is high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimising Foreign Key checks

2013-06-08 Thread Noah Misch
On Sat, Jun 08, 2013 at 08:20:42PM -0400, Robert Haas wrote:
 On Sat, Jun 8, 2013 at 5:41 PM, Noah Misch n...@leadboat.com wrote:
  Likewise; I don't see why we couldn't perform an optimistic check ASAP and
  schedule a final after-statement check when an early check fails.  That
  changes performance characteristics without changing semantics.
 
 ...this seems like it might have some promise; but what if the action
 we're performing isn't idempotent?  And how do we know?

The action discussed so far is RI_FKey_check_ins().  It acquires a KEY SHARE
lock (idempotent by nature) on a row that it finds using B-tree equality
(presumed IMMUTABLE, thus idempotent).  RI_FKey_check_upd() is nearly the same
action, so the same argument holds.  Before treating any other operation in
the same way, one would need to conduct similar analysis.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

2013-06-08 Thread Noah Misch
On Sat, Jun 08, 2013 at 08:09:15PM -0400, Robert Haas wrote:
 On Sat, Jun 8, 2013 at 10:25 AM, Andrew Dunstan and...@dunslane.net wrote:
  Don't downcase non-ascii identifier chars in multi-byte encodings.
 
  Long-standing code has called tolower() on identifier character bytes
  with the high bit set. This is clearly an error and produces junk output
  when the encoding is multi-byte. This patch therefore restricts this
  activity to cases where there is a character with the high bit set AND
  the encoding is single-byte.
 
  There have been numerous gripes about this, most recently from Martin
  Sch?fer.
 
  Backpatch to all live releases.
 
 I'm all for changing this, but back-patching seems like a terrible
 idea.  This could easily break queries that are working now.

If more than one encoding covers the characters used in a given application,
that application's semantics should be the same regardless of which of those
encodings is in use.  We certainly don't _guarantee_ that today; PostgreSQL
leaves much to libc, which may not implement the relevant locales compatibly.
However, this change bakes into PostgreSQL itself a departure from that
principle.  If a database contains tables ä and Ä, which of those SELECT
* FROM Ä finds will be encoding-dependent.  If we're going to improve the
current (granted, worse) downcase_truncate_identifier() behavior, we should
not adopt another specification bearing such surprises.

Let's return to the drawing board on this one.  I would be inclined to keep
the current bad behavior until we implement the i18n-aware case folding
required by SQL.  If I'm alone in thinking that, perhaps switch to downcasing
only ASCII characters regardless of the encoding.  That at least gives
consistent application behavior.

I apologize for not noticing to comment on this week's thread.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed patch: remove hard-coded limit MAX_ALLOCATED_DESCS

2013-06-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Recently we had a gripe about how you can't read very many files
 concurrently with contrib/file_fdw:
 http://www.postgresql.org/message-id/of419b5767.8a3c9adb-on85257b79.005491e9-85257b79.0054f...@isn.rtss.qc.ca

Ouch, that's pretty bad.

 Barring objections or better ideas, I'd like to back-patch this as far
 as 9.1 where file_fdw was introduced.

+1.  This would bump the limit to something like 300 instead of 10,
right?  That seems pretty reasonable.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] small patch to crypt.c

2013-06-08 Thread Stephen Frost
JD,

* Joshua D. Drake (j...@commandprompt.com) wrote:
 In my quest to understand how all the logging etc works with
 authentication I came across the area of crypt.c that checks for
 valid_until but it seems like it has an extraneous check.
 
 If I am wrong I apologize for the noise but wouldn't mind an explanation.

Alright, there probably aren't too many people out there running with
their clock set to pre-2000, but wouldn't this end up giving the wrong
result in those cases, as GetCurrentTimestamp() would end up returning a
negative value, which would make it less than vuntil's default of zero?

Perhaps we could change what vuntil is set to by default, but I think
it's probably better to keep things as-is; we should really be checking
for null cases explicitly in general.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

2013-06-08 Thread Andrew Dunstan


On 06/08/2013 10:52 PM, Noah Misch wrote:

On Sat, Jun 08, 2013 at 08:09:15PM -0400, Robert Haas wrote:

On Sat, Jun 8, 2013 at 10:25 AM, Andrew Dunstan and...@dunslane.net wrote:

Don't downcase non-ascii identifier chars in multi-byte encodings.

Long-standing code has called tolower() on identifier character bytes
with the high bit set. This is clearly an error and produces junk output
when the encoding is multi-byte. This patch therefore restricts this
activity to cases where there is a character with the high bit set AND
the encoding is single-byte.

There have been numerous gripes about this, most recently from Martin
Sch?fer.

Backpatch to all live releases.

I'm all for changing this, but back-patching seems like a terrible
idea.  This could easily break queries that are working now.

If more than one encoding covers the characters used in a given application,
that application's semantics should be the same regardless of which of those
encodings is in use.  We certainly don't _guarantee_ that today; PostgreSQL
leaves much to libc, which may not implement the relevant locales compatibly.
However, this change bakes into PostgreSQL itself a departure from that
principle.  If a database contains tables ä and Ä, which of those SELECT
* FROM Ä finds will be encoding-dependent.  If we're going to improve the
current (granted, worse) downcase_truncate_identifier() behavior, we should
not adopt another specification bearing such surprises.

Let's return to the drawing board on this one.  I would be inclined to keep
the current bad behavior until we implement the i18n-aware case folding
required by SQL.  If I'm alone in thinking that, perhaps switch to downcasing
only ASCII characters regardless of the encoding.  That at least gives
consistent application behavior.

I apologize for not noticing to comment on this week's thread.



The behaviour which this fixes is an unambiguous bug. Calling tolower() 
on the individual bytes of a multi-byte character can't possibly produce 
any sort of correct result. A database that contains such corrupted 
names, probably not valid in any encoding at all, is almost certainly 
not restorable, and I'm not sure if it's dumpable either. It's already 
produced several complaints in recent months, so ISTM that returning to 
it for any period of time is unthinkable.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Batch API for After Triggers

2013-06-08 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
 While fiddling with FK tuning, Noah suggested batching trigger
 executions together to avoid execution overhead.

I like the general idea, but I'd prefer a way to avoid having to queue
up tons of trigger events to be executed in the first place.
Aggregates do this by providing a way to store up information to be
processed by an eventual 'final' function.  Another option, as Kevin
asked about, would be statement level triggers which are able to see
both the OLD and the NEW versions of the relation.

The per-row trigger option with a way to be called immediately and then
store what it cares about for a later final function strikes me as very
generalized and able to do things that the statement-level option
couldn't, but I'm not sure if there's a use-case that could solve which
the OLD/NEW statement trigger capability couldn't.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: [COMMITTERS] pgsql: Don't downcase non-ascii identifier chars in multi-byte encoding

2013-06-08 Thread Noah Misch
On Sat, Jun 08, 2013 at 11:50:53PM -0400, Andrew Dunstan wrote:
 On 06/08/2013 10:52 PM, Noah Misch wrote:

 Let's return to the drawing board on this one.  I would be inclined to keep
 the current bad behavior until we implement the i18n-aware case folding
 required by SQL.  If I'm alone in thinking that, perhaps switch to downcasing
 only ASCII characters regardless of the encoding.  That at least gives
 consistent application behavior.

 I apologize for not noticing to comment on this week's thread.


 The behaviour which this fixes is an unambiguous bug. Calling tolower()  
 on the individual bytes of a multi-byte character can't possibly produce  
 any sort of correct result. A database that contains such corrupted  
 names, probably not valid in any encoding at all, is almost certainly  
 not restorable, and I'm not sure if it's dumpable either.

I agree with each of those points.  However, since any change here breaks
compatibility, we should fix it right the first time.  A second compatibility
break would be all the more onerous once this intermediate step helps more
users to start using unquoted, non-ASCII object names.

 It's already  
 produced several complaints in recent months, so ISTM that returning to  
 it for any period of time is unthinkable.

PostgreSQL has lived with this wrong behavior since ... the beginning?  It's a
problem, certainly, but a bandage fix brings its own trouble.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2013-06-08 Thread Craig Ringer
On 06/07/2013 01:00 AM, Josh Berkus wrote:
 Daniel,

 So your suggestion is that if archiving is falling behind, we should
 introduce delays on COMMIT in order to slow down the rate of WAL writing?
Delaying commit wouldn't be enough; consider a huge COPY, which can
produce a lot of WAL at a high rate without a convenient point to delay at.

I expect a delay after writing an xlog record would make a more suitable
write-rate throttle, though I'd want to be sure the extra branch didn't
hurt performance significantly. Branch prediction hints would help;
since we don't *care* if the delay branch is slow and causes pipeline
stalls, tagging the no-delay branch as likely would probably deal with
that concern for supported compilers/platforms.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2013-06-08 Thread Craig Ringer
On 06/06/2013 03:21 PM, Joshua D. Drake wrote:

 Not to be unkind but the problems of the uniformed certainly are not
 the problems of the informed. Or perhaps they are certainly the
 problems of the informed :P.
I'm not convinced that's a particularly good argument not to improve
something. Sure, it might be a usability issue not a purely technical
issue, but that IMO doesn't make it much less worth fixing.

Bad usability puts people off early, before they can become productive
and helpful community members. It also puts others off trying the
software at all by reputation alone.

In any case, I don't think this is an issue of the informed vs
uninformed. It's also a matter of operational sanity at scale. The
sysadmin can't watch 100,000 individual servers and jump in to make
minute tweaks - nor should they have to when some auto-tuning could
obviate the need.

The same issue exists with vacuum - it's hard for basic users to
understand, so they misconfigure it and often achieve the opposite
results to what they need. It's been getting better, but some
feedback-based control would make a world of difference when running Pg.

In this I really have to agree with Hekki and Daniel - more usable and
preferably feedback-tuned defaults would be really, really nice to have,
though I'd want good visibility (logging, SHOW commands, etc) into what
they were doing and options to override for special cases.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/06/2013 10:00 PM, Heikki Linnakangas wrote:

 I've seen a case, where it was even worse than a PANIC and shutdown.
 pg_xlog was on a separate partition that had nothing else on it. The
 partition filled up, and the system shut down with a PANIC. Because
 there was no space left, it could not even write the checkpoint after
 recovery, and thus refused to start up again. There was nothing else
 on the partition that you could delete to make space. The only
 recourse would've been to add more disk space to the partition
 (impossible), or manually delete an old WAL file that was not needed
 to recover from the latest checkpoint (scary). Fortunately this was a
 test system, so we just deleted everything.

There were a couple of dba.stackexchange.com reports along the same
lines recently, too. Both involved an antivirus vendor's VM appliance
with a canned (stupid) configuration that set wal_keep_segments too high
for the disk space allocated and stored WAL on a separate partition.

People are having issues with WAL space management in the real world and
I think it and autovacuum are the two hardest things for most people to
configure and understand in Pg right now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-08 Thread Craig Ringer
On 06/08/2013 10:57 AM, Daniel Farina wrote:

 At which point most sensible users say no thanks, I'll use something else.
 [snip]

 I have a clear bias in experience here, but I can't relate to someone
 who sets up archives but is totally okay losing a segment unceremoniously,
 because it only takes one of those once in a while to make a really,
 really bad day. 

It sounds like between you both you've come up with a pretty solid
argument by exclusion for throttling WAL writing as space grows
critical. Dropping archive segments seems pretty unsafe from the solid
arguments Daniel presents, and Josh makes a good case for why shutting
the DB down when archiving can't keep up isn't any better.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small patch to crypt.c

2013-06-08 Thread Joshua D. Drake


On 06/08/2013 08:47 PM, Stephen Frost wrote:

JD,

* Joshua D. Drake (j...@commandprompt.com) wrote:

In my quest to understand how all the logging etc works with
authentication I came across the area of crypt.c that checks for
valid_until but it seems like it has an extraneous check.

If I am wrong I apologize for the noise but wouldn't mind an explanation.


Alright, there probably aren't too many people out there running with
their clock set to pre-2000, but wouldn't this end up giving the wrong
result in those cases, as GetCurrentTimestamp() would end up returning a
negative value, which would make it less than vuntil's default of zero?

Perhaps we could change what vuntil is set to by default, but I think
it's probably better to keep things as-is; we should really be checking
for null cases explicitly in general.


Well I was more referring to the default is:

check if null, if true return ok
check if valuntil  today, if true return error
else return ok

To me we don't need the null check. However, when I tested it, without 
the null check you can't login. So now I am curious about what is going on.


JD



Thanks,

Stephen





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers