Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2014-02-14 Thread Amit Kapila
On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian  wrote:
> On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
>> On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila  wrote:
>> > On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila  
>> > wrote:
>> >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander  
>> >> wrote:
>> >>> I agree we'll probably want to work around it in the end, but I still
>> >>> think it should be put to Microsoft PSS if we can. The usual - have we
>> >>> actually produced a self-contained example that does just this (and
>> >>> doesn't include the full postgres support) and submitted it to
>> >>> *microsoft* for comments?
>>
>> Further update on this issue:
>>
>> Microsoft has suggested a workaround for stat API. Their suggestion
>> is to use 'GetFileAttributesEx' instead of stat, when I tried their
>> suggestion, it also gives me same problem as stat.
>>
>> Still they have not told anything about other API's
>> (rmdir, RemoveDirectory) which has same problem.
>
> Where are we on this?

Till now we didn't received any workaround which can fix this problem
from Microsoft. From the discussion over support ticket with them,
it seems this problem is in their kernel and changing the code for
it might not be straight forward for them, neither they have any clear
alternative.

> Is there a check we should add in our code?

We can possibly solve this problem in one of the below ways:

1. Resolve symbolic link to actual path in code whenever we tries to
access it.

2. Another way is to check in code (initdb and create tablespace)
to not allow path of length more than ~120 for Windows.

Approach-1 has benefit that it can support the actual MAX_PATH and
even if MS doesn't resolve the problem, PostgreSQL will not face it.

Approach-2 is straightforward to fix. If we want to go with Approach-2,
then we might change the limit of MaxPath for windows in future
whenever there is a fix for it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Bruce Momjian
On Fri, Feb 14, 2014 at 11:52:42PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Feb 14, 2014 at 11:28:49PM -0500, Tom Lane wrote:
> >> The first and third hunks look to me like they introduce memory
> >> leaks, not fix them.  The second hunk is probably safe enough,
> 
> > The first and third just move the free into blocks where we have already
> > tested the value is not null.  It just more clearly matches the
> > surrounding code.  Do you see something different?
> 
> [ looks closer... ]  OK, they're not actually broken, but I question
> whether they're improvements.  The existing coding is clear enough
> that the result of psql_scan_slash_option will be freed.  What you're
> doing is conflating that requirement with some other processing.
> 
> >> although I'm not sure the case can actually occur --- gset should
> >> free the prefix before any new backslash command can be executed.
> 
> > Oh, interesting idea.  Updated patch attached.
> 
> I don't think that's an improvement at all.  My point was that I
> didn't think gset_prefix would ever be set at entry to this code,
> because the setting should never survive for more than one backslash
> command execution cycle.
> 
> If you want to add probably-useless code to free it, go ahead, but
> this isn't a clearer way to do that than your previous version.

If you think this code isn't helping, I will just discard it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Feb 14, 2014 at 11:28:49PM -0500, Tom Lane wrote:
>> The first and third hunks look to me like they introduce memory
>> leaks, not fix them.  The second hunk is probably safe enough,

> The first and third just move the free into blocks where we have already
> tested the value is not null.  It just more clearly matches the
> surrounding code.  Do you see something different?

[ looks closer... ]  OK, they're not actually broken, but I question
whether they're improvements.  The existing coding is clear enough
that the result of psql_scan_slash_option will be freed.  What you're
doing is conflating that requirement with some other processing.

>> although I'm not sure the case can actually occur --- gset should
>> free the prefix before any new backslash command can be executed.

> Oh, interesting idea.  Updated patch attached.

I don't think that's an improvement at all.  My point was that I
didn't think gset_prefix would ever be set at entry to this code,
because the setting should never survive for more than one backslash
command execution cycle.

If you want to add probably-useless code to free it, go ahead, but
this isn't a clearer way to do that than your previous version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Bruce Momjian
On Fri, Feb 14, 2014 at 11:28:49PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > The attached tiny patch fixes a small leak in psql's \gset command and
> > simplifies memory freeing in two places.
> 
> The first and third hunks look to me like they introduce memory
> leaks, not fix them.  The second hunk is probably safe enough,

The first and third just move the free into blocks where we have already
tested the value is not null.  It just more clearly matches the
surrounding code.  Do you see something different?

> although I'm not sure the case can actually occur --- gset should
> free the prefix before any new backslash command can be executed.

Oh, interesting idea.  Updated patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 764534a..8225159
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** exec_command(const char *cmd,
*** 746,762 
  		{
  			expand_tilde(&fname);
  			pset.gfname = pg_strdup(fname);
  		}
- 		free(fname);
  		status = PSQL_CMD_SEND;
  	}
  
  	/* \gset [prefix] -- send query and store result into variables */
  	else if (strcmp(cmd, "gset") == 0)
  	{
! 		char	   *prefix = psql_scan_slash_option(scan_state,
! 	OT_NORMAL, NULL, false);
  
  		if (prefix)
  			pset.gset_prefix = prefix;
  		else
--- 746,766 
  		{
  			expand_tilde(&fname);
  			pset.gfname = pg_strdup(fname);
+ 			free(fname);
  		}
  		status = PSQL_CMD_SEND;
  	}
  
  	/* \gset [prefix] -- send query and store result into variables */
  	else if (strcmp(cmd, "gset") == 0)
  	{
! 		char	   *prefix;
! 
! 		if (pset.gset_prefix)
! 			free(pset.gset_prefix);
  
+ 		prefix = psql_scan_slash_option(scan_state,
+ 		OT_NORMAL, NULL, false);
  		if (prefix)
  			pset.gset_prefix = prefix;
  		else
*** exec_command(const char *cmd,
*** 1152,1159 
  success = false;
  			}
  			free(newval);
  		}
- 		free(opt0);
  	}
  
  
--- 1156,1163 
  success = false;
  			}
  			free(newval);
+ 			free(opt0);
  		}
  	}
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small psql memory fix

2014-02-14 Thread Tom Lane
Bruce Momjian  writes:
> The attached tiny patch fixes a small leak in psql's \gset command and
> simplifies memory freeing in two places.

The first and third hunks look to me like they introduce memory
leaks, not fix them.  The second hunk is probably safe enough,
although I'm not sure the case can actually occur --- gset should
free the prefix before any new backslash command can be executed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication

2014-02-14 Thread Robert Haas
On Wed, Feb 12, 2014 at 8:00 AM, Christian Kruse
 wrote:
> On Wednesday 12 February 2014 11:14:56 Andres Freund wrote:
>> But they do take up shared memory without really needing to. I
>> personally don't find that too bad, it's not much memory. If we want to
>> avoid it we could have a LocalPgBackendStatus that includes the normal
>> PgBackendStatus. Since pgstat_read_current_status() copies all the data
>> locally, that'd be a sensible point to fill it. While that will cause a
>> bit of churn, I'd guess we can use the infrastructure in the not too far
>> away future for other parts.
>
> That's a good idea. Attached you will find a patch implementing it
> that way; is this how you pictured it?
>
> Although I'm not sure if this shouldn't be done in two patches, one
> for the changes needed for LocalPgBackendStatus and one for the
> xid/xmin changes.

Well, this version of the patch reveals a mighty interesting point: a
lot of the people who are calling pgstat_fetch_stat_beentry() don't
need this additional information and might prefer not to pay the cost
of fetching it.  None of pg_stat_get_backend_pid,
pg_stat_get_backend_dbid, pg_stat_get_backend_userid,
pg_stat_get_backend_activity, pg_stat_get_backend_activity,
pg_stat_get_backend_waiting, pg_stat_get_backend_activity_start,
pg_stat_get_backend_xact_start, pg_stat_get_backend_start,
pg_stat_get_backend_client_addr, pg_stat_get_backend_client_port,
pg_stat_get_backend_client_port, and pg_stat_get_db_numbackends
actually need this new information; it's only ever used in one place.
So it seems like it might be wise to have pgstat_fetch_stat_beentry
continue to return the PgBackendStatus * and add a new function
pgstat_fetch_stat_local_beentry to fetch the LocalPgBackendStatus *;
then most of these call sites wouldn't need to change.

It would still be the case that pgstat_read_current_status() pays the
price of fetching this information even if pg_stat_get_activity is
never called.  But since that's probably by far the most commonly-used
API for this information, that's probably OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small psql memory fix

2014-02-14 Thread Bruce Momjian
The attached tiny patch fixes a small leak in psql's \gset command and
simplifies memory freeing in two places.
 
-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 764534a..e80528d
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** exec_command(const char *cmd,
*** 746,753 
  		{
  			expand_tilde(&fname);
  			pset.gfname = pg_strdup(fname);
  		}
- 		free(fname);
  		status = PSQL_CMD_SEND;
  	}
  
--- 746,753 
  		{
  			expand_tilde(&fname);
  			pset.gfname = pg_strdup(fname);
+ 			free(fname);
  		}
  		status = PSQL_CMD_SEND;
  	}
  
*** exec_command(const char *cmd,
*** 757,762 
--- 757,764 
  		char	   *prefix = psql_scan_slash_option(scan_state,
  	OT_NORMAL, NULL, false);
  
+ 		if (pset.gset_prefix)
+ 			free(pset.gset_prefix);
  		if (prefix)
  			pset.gset_prefix = prefix;
  		else
*** exec_command(const char *cmd,
*** 1152,1159 
  success = false;
  			}
  			free(newval);
  		}
- 		free(opt0);
  	}
  
  
--- 1154,1161 
  success = false;
  			}
  			free(newval);
+ 			free(opt0);
  		}
  	}
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Tom Lane
Andres Freund  writes:
> On 2014-02-14 20:46:01 +, Greg Stark wrote:
>> Going over this I think this is still a potential issue:
>> On 31 Jan 2014 15:56, "Andres Freund"  wrote:
>>> I am not sure that explains the issue, but I think the redo action for
>>> truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
>>> do a smgrtruncate() (and then mdtruncate) which will iterate over the
>>> segments starting at 0 till mdnblocks()/segment_size and *truncate* but
>>> not delete individual segment files that are not needed anymore, right?
>>> If we crash in the midst of that a new mdtruncate() will be issued, but
>>> it will get a shorter value back from mdnblocks().

>> I'm not too familiar with md.c but my reading of the code is that we
>> truncate the files in reverse order?

> That's what I had assumed as well, but it doesn't look that way:

No, it's deleting forward.

We could probably fix things so it deleted backwards; it'd be a tad
tedious because the list structure isn't organized that way, but we
could do it.  Not sure if that's good enough though.  If you don't
want to assume the filesystem metadata is coherent after a crash,
we might have nonzero-size segments after zero-size ones, even if
the truncate calls had been issued in the right order.

Another possibility is to keep opening and truncating files until
we don't find the next segment in sequence, looking directly at the
filesystem not at the mdfd chain.  I don't think this would be
appropriate in normal operation, but we could do it if InRecovery
(and maybe even only if we don't think the database is consistent?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 19:21 , Andres Freund  wrote:
> On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
>> Well, the assumption isn't all that new. We already have the situation that
>> a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
>> Currently, the process who took it off the queue is responsible for 
>> rectifying
>> that eventually, with the changed proposed below it'll be the backend who
>> owns the PGPROC. From the point of view of other backends, nothing really
>> changes.
> 
> That window is absolutely tiny tho.

True, but it's there, so if anything counts on that never being the case, it's
still already broken.

 b) resetting lwWaitLink introduces a race condition (however unlikely)
 
 we'll trade correctness for cleanliness if we continue to reset lwWaitLink
 without protecting against the race. That's a bit of a weird trade-off to 
 make.
>>> 
>>> It's not just cleanliness, it's being able to actually debug crashes.
>> 
>> We could still arrange for the stray lwWaitLink from being visible only
>> momentarily. If a backend is woken up and detects that lwWaiting is false,
>> it knows that it cannot be on any wait queue - it was just removed, and
>> hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
>> back to NULL. The stray value would thus only be visible while a backend 
>> executes
>> the PGSemaphoreLock() loop, and whether or not this is the case can be 
>> deduced
>> from a stack trace. So I'm not convinced that this makes debugging any 
>> harder.
> 
> That's not actually safe on an out of order architecture afaics. Without
> barriers the store to lwWaitLink in the woken up backend can preempt the
> read for the next element in the PGSemaphoreUnlock() loop.

Hm, true. I guess we could prevent that by introducing an artificial dependency
between the read and the write - something like

  proc = head;
  head = proc->lwWaitLink
  /*
   * We don't ever expect to actually PANIC here, but the test forces the
   * load to execute before the store to lwWaiting. This prevents a race
   * between reading lwWaitLink here and resetting it back to zero in the
   * awoken backend (Note that backends can wake up spuriously, so just
   * reading it before doing PGSemaphoreUnlock is insufficient)
   */
  if (head != MyProc)
proc->lwWaiting = false;
  else
elog(PANIC, ...)
  PGSemaphoreUnlock(&proc->sem);

(This assumes that proc is a volatile pointer)

Another idea would be to do as you suggest and only mark the PGPROC pointers
volatile, but to additionally add a check for queue corruption somewhere. We 
should
be able to detect that - if we ever hit this issue, LWLockRelease should find a
PGPROC while traversing the queue whose lwWaitLink is NULL but which isn't 
equal to
lock->tail. If that ever happens, we'd simply PANIC.

best regards,
Florian Pflug





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #9210: PostgreSQL string store bug? not enforce check with correct characterSET/encoding

2014-02-14 Thread digoal
HI,
   Thanks very much.  
We use dblink or foreign table migrate datas instead pg_dump now resolve 
the error data load problem.

--
公益是一辈子的事,I'm Digoal,Just Do It.




At 2014-02-14 04:49:08,"Tom Lane"  wrote:
>dig...@126.com writes:
>> select t, t::bytea from convert_from('\xeec1', 'sql_ascii') as g(t);
>> [ fails to check that string is valid in database encoding ]
>
>Hm, yeah.  Normal input to the database goes through pg_any_to_server(),
>which will apply a validation step if the source encoding is SQL_ASCII
>and the destination encoding is something else.  However, pg_convert and
>some other places call pg_do_encoding_conversion() directly, and that
>function will just quietly do nothing if either encoding is SQL_ASCII.
>
>The minimum-refactoring solution to this would be to tweak
>pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but
>the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing.
>
>I'm not sure if this would break anything we need to have work,
>though.  Thoughts?  Do we want to back-patch such a change?
>
>   regards, tom lane


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Tom Lane
Hiroshi Inoue  writes:
> (2014/02/15 2:32), Tom Lane wrote:
>> (BTW, narwhal is evidently not trying to build plpython.  I wonder
>> why not?)

> Due to *initializer element is not constant* error which also can be
>  see on my old machine.

Hmm, isn't that one of the symptoms of inadequacies in
--enable-auto-import?  Wonder if the disable change fixed it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we add a XLogRecPtr/LSN SQL type?

2014-02-14 Thread Michael Paquier
On Thu, Feb 6, 2014 at 11:26 AM, Michael Paquier
 wrote:
> Here are updated patches to use pg_lsn instead of pglsn...
Should I register this patch somewhere to avoid having it lost in the void?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-14 Thread Greg Stark
On 14 Feb 2014 23:07, "Tom Lane"  wrote:
>
> If this is, as it sounds to be, a Solaris shell bug, doesn't it
> affect other daemons too?

This is simmering i never exactly followed but i think if the shell doesn't
support job control it's expected behaviour, not a bug. Only shells that
support job control create new process groups for every backgrounded
command.

I would have expected if I run postgres myself that it be attached to the
terminal and die when I C-c it but if it's started by pg_ctl I would have
thought it was running independently of my terminal and shell.


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Hiroshi Inoue
(2014/02/15 2:32), Tom Lane wrote:
> I wrote:
>> Hiroshi Inoue  writes:
>>> One thing I'm wondering about is that plperl is linking perlxx.lib
>>> not libperlxx.a. I made a patch following plpython and it also
>>> works here.
>>> Is it worth trying?
> 
>> I hadn't noticed that part of plpython's Makefile before.  Man,
>> that's an ugly technique :-(.  Still, there's little about this
>> platform that isn't ugly.  Let's try it and see what happens.
> 
> And what happens is this:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhal&dt=2014-02-14%2017%3A00%3A02
> namely, it gets through plperl now and then chokes with the same
> symptoms on pltcl.  So I guess we need the same hack in pltcl.
> The fun never stops ...

There seems some risk to link msvc import library.
Linking perlxx.lib or tcl.lib worls here but linking pythonxx.lib
doesn't work even in newer machine.

> (BTW, narwhal is evidently not trying to build plpython.  I wonder
> why not?)

Due to *initializer element is not constant* error which also can be
 see on my old machine.
http://www.postgresql.org/message-id/CA+OCxozr=0wkQDF7kfd2n-bJQOwdSUs0Myohg29pA_U5=2p...@mail.gmail.com


regards,
Hiroshi Inoue




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Andres Freund
On 2014-02-14 20:46:01 +, Greg Stark wrote:
> Going over this I think this is still a potential issue:
> 
> On 31 Jan 2014 15:56, "Andres Freund"  wrote:
> 
> >
> > I am not sure that explains the issue, but I think the redo action for
> > truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
> > do a smgrtruncate() (and then mdtruncate) which will iterate over the
> > segments starting at 0 till mdnblocks()/segment_size and *truncate* but
> > not delete individual segment files that are not needed anymore, right?
> > If we crash in the midst of that a new mdtruncate() will be issued, but
> > it will get a shorter value back from mdnblocks().
> >
> > Am I missing something?
> >
> 
> I'm not too familiar with md.c but my reading of the code is that we
> truncate the files in reverse order?

That's what I had assumed as well, but it doesn't look that way:

v = mdopen(reln, forknum, EXTENSION_FAIL);

priorblocks = 0; /* <- initial value */
while (v != NULL)
{
MdfdVec*ov = v;

if (priorblocks > nblocks)
{
/* truncate entire segment */
}
else if (priorblocks + ((BlockNumber) RELSEG_SIZE) > nblocks)
{
/* truncate entire segment */
}
else
{
/* nothing to do, all needed */
}
priorblocks += RELSEG_SIZE;
}

So, according to that we start at segment 0, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 9:16 PM, David Beck  wrote:
> What inspired me is the scriptable query rewrite in 
> http://dev.mysql.com/downloads/mysql-proxy/
> The hook I proposed would be a lot nicer in Postgres because the raw parsing 
> is already done at this point while in mysql-proxy that has to be done 
> manually.

There are a few similar things in the Postgres world such as pgbouncer
and plproxy but as you note they are limited in how powerful they can
be by the complexity of parsing and analyzing SQL.

I think Postgres tends to take a different tack regarding
extensibility than MySQL. Rather than have one system then hooks to
allow external code to modify or replace it, in Postgres modules
usually are treated similarly to the internal code. This is a pretty
broad generalization and there are certainly exceptions. But for
example new data types, functions, even whole new index types are all
treated almost identically to the internal data types, functions, and
index types. The planner then considers them all more or less on equal
basis.

Obviously there are limits. Btree indexes are kind of special because
they represent Postgres's basic concept of ordering. And we don't have
a pluggable recovery system which makes any externally provided index
types non-crash-safe. And we don't have a generalized concept of table
storage -- the closest thing we have is FDWs which is more like
MySQL's style of extensibility where the extension is a special case.

But this is why our instinct is that if you want to be able to push
down joins the way to do it is to extend the FDW api so that the
planner can make those decisions just like it makes the decisions
about internal joins rather than have an external module that takes
over part of the planner's work.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Tom Lane
Greg Stark  writes:
> On Fri, Feb 14, 2014 at 9:16 PM, David Beck  wrote:
>> Another point I liked in mysql is the possibility to write info schema 
>> plugins: 
>> http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
>> Like a virtual catalog. Is there anything similar in Postgres?

> The documentation you linked to describes how to provide
> information_schema plugins but not why you would want to do such a
> thing. I'm not seeing why this would be useful. The information_schema
> schema is described by the standard so creating new views in it isn't
> needed very often and the schema for the existing views doesn't change
> very often.

IIRC, mysql has a liberal view about what they can do to the
information_schema, so for them this isn't as insane as it sounds to us.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-02-14 Thread Haribabu Kommi
On Feb 15, 2014 9:19 AM, "Alvaro Herrera"  wrote:
>
> Haribabu Kommi escribió:
>
> > I changed the balance cost calculations a little bit to give priority to
> > the user provided per table autovacuum parameters.
> > If any user specified per table vacuum parameters exists and those are
> > different with guc vacuum parameters then the
> > balance cost calculations will not include that worker in calculation.
Only
> > the cost is distributed between other workers
> > with specified guc vacuum cost parameter.
> >
> > The problem in this calculation is if the user provides same guc values
to
> > the per table values also then it doesn't consider them in calculation.
>
> I think this is a strange approach to the problem, because if you
> configure the backends just so, they are completely ignored instead of
> being adjusted.  And this would have action-at-a-distance consequences
> because if you change the defaults in postgresql.conf you end up with
> completely different behavior on the tables for which you have carefully
> tuned the delay so that they are ignored in rebalance calculations.
>
> I think that rather than ignoring some backends completely, we should be
> looking at how to "weight" the balancing calculations among all the
> backends in some smart way that doesn't mean they end up with the
> default values of limit, which AFAIU is what happens now -- which is
> stupid.  Not real sure how to do that, perhaps base it on the
> globally-configured ratio of delay/limit vs. the table-specific ratio.
>
> What I mean is that perhaps the current approach is all wrong and we
> need to find a better algorithm to suit this case and more generally.
> Of course, I don't mean to say that it should behave completely
> differently than now in normal cases, only that it shouldn't give
> completely stupid results in corner cases such as this one.
>
> As an example, suppose that global limit=200 and global delay=20 (the
> defaults).  Then we have a global ratio of 5.  If all three tables being
> vacuumed currently are using the default values, then they all have
> ratio=5 and therefore all should have the same limit and delay settings
> applied after rebalance.  Now, if two tables have ratio=5 and one table
> has been configured to have a very fast vacuum, that is limit=1,
> then ratio for that table is 1/20=500.  Therefore that table should
> be configured, after rebalance, to have a limit and delay that are 100
> times faster than the settings for the other two tables.  (And there is
> a further constraint that the total delay per "limit unit" should be
> so-and-so to accomodate getting the correct total delay per limit unit.)
>
> I haven't thought about how to code that, but I don't think it should be
> too difficult.  Want to give it a try?  I think it makes sense to modify
> both the running delay and the running limit to achieve whatever ratio
> we come up with, except that delay should probably not go below 10ms
> because, apparently, some platforms have that much of sleep granularity
> and it wouldn't really work to have a smaller delay.
>
> Am I making sense?

Yes makes sense and it's a good approach also not leaving the delay
parameter as is. Thanks I will give a try.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 9:16 PM, David Beck  wrote:
> Another point I liked in mysql is the possibility to write info schema 
> plugins: 
> http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
> Like a virtual catalog. Is there anything similar in Postgres?

The documentation you linked to describes how to provide
information_schema plugins but not why you would want to do such a
thing. I'm not seeing why this would be useful. The information_schema
schema is described by the standard so creating new views in it isn't
needed very often and the schema for the existing views doesn't change
very often. I can see why a plugin might want to add rows to the views
but that doesn't seem to be what this feature is about.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-14 Thread Tom Lane
Kevin Grittner  writes:
> What is surprising is that the postmaster doesn't set up its own
> process group when it is running as a daemon.  We probably don't
> want to change that when postgres is run directly from a command
> line for development or diagnostic purposes, but Noah suggested
> perhaps we should add a --daemonize option which pg_ctl should use
> when launching the postmaster, which would cause it to create its
> own session group.

We intentionally removed the daemonization support that used to
be there; see commit f7ea6beaf4ca02b8e6dc576255e35a5b86035cb9.
One of the things it did was exactly this.  I'm a bit disinclined
to put that back.

If this is, as it sounds to be, a Solaris shell bug, doesn't it
affect other daemons too?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster

2014-02-14 Thread Kevin Grittner
We have had a case where a production cluster was accidentally shut
down by a customer who used Ctrl+C in the same sh session in which
they had (long before) run pg_ctl start.  We have only seen this in
sh on Solaris.  Other shells on Solaris don't behave this way, nor
does sh on tested versions of Linux.  Nevertheless, the problem is
seen on the default shell for a supported OS.

Analysis suggests that this is because the postmaster retains the
process group ID of the original parent (in this case pg_ctl).  If
pg_ctl is run through the setpgrp command a subsequent Ctrl+C in
the sh session does not shut down the PostgreSQL cluster.

On my development Linux machine:

$ ps axfopid,ppid,pgid,command
  PID  PPID  PGID COMMAND
[ ... ]
 8416 1  8412 /home/kgrittn/pg/master/Debug/bin/postgres -D Debug/data
 8418  8416  8418  \_ postgres: checkpointer process
 8419  8416  8419  \_ postgres: writer process
 8420  8416  8420  \_ postgres: wal writer process
 8421  8416  8421  \_ postgres: autovacuum launcher process
 8422  8416  8422  \_ postgres: stats collector process
 8427  8416  8427  \_ postgres: kgrittn test [local] idle

All of the PPID values seem correct, and while the PGID values for
backends might initially seem surprising, the commit notes and C
comments here explain why each backend sets up its own process
group:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3ad0728

What is surprising is that the postmaster doesn't set up its own
process group when it is running as a daemon.  We probably don't
want to change that when postgres is run directly from a command
line for development or diagnostic purposes, but Noah suggested
perhaps we should add a --daemonize option which pg_ctl should use
when launching the postmaster, which would cause it to create its
own session group.

Although this is arguably a bug, it seems like it is very rarely
hit and has several workarounds, and any fix would either change
things in a way which might break existing user scripts or would
require a new command-line option; so back-patching a fix to stable
branches doesn't seem appropriate.  I would argue for including a
fix in 9.4 on the basis of it being a bug fix and there being time
to mention it in the release change notes; but I understand the
counter-arguments and realize this is a judgment call.

Thoughts?

If the new option seems reasonable, I can draft a patch to
implement that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-02-14 Thread Alvaro Herrera
Haribabu Kommi escribió:

> I changed the balance cost calculations a little bit to give priority to
> the user provided per table autovacuum parameters.
> If any user specified per table vacuum parameters exists and those are
> different with guc vacuum parameters then the
> balance cost calculations will not include that worker in calculation. Only
> the cost is distributed between other workers
> with specified guc vacuum cost parameter.
> 
> The problem in this calculation is if the user provides same guc values to
> the per table values also then it doesn't consider them in calculation.

I think this is a strange approach to the problem, because if you
configure the backends just so, they are completely ignored instead of
being adjusted.  And this would have action-at-a-distance consequences
because if you change the defaults in postgresql.conf you end up with
completely different behavior on the tables for which you have carefully
tuned the delay so that they are ignored in rebalance calculations.

I think that rather than ignoring some backends completely, we should be
looking at how to "weight" the balancing calculations among all the
backends in some smart way that doesn't mean they end up with the
default values of limit, which AFAIU is what happens now -- which is
stupid.  Not real sure how to do that, perhaps base it on the
globally-configured ratio of delay/limit vs. the table-specific ratio.

What I mean is that perhaps the current approach is all wrong and we
need to find a better algorithm to suit this case and more generally.
Of course, I don't mean to say that it should behave completely
differently than now in normal cases, only that it shouldn't give
completely stupid results in corner cases such as this one.

As an example, suppose that global limit=200 and global delay=20 (the
defaults).  Then we have a global ratio of 5.  If all three tables being
vacuumed currently are using the default values, then they all have
ratio=5 and therefore all should have the same limit and delay settings
applied after rebalance.  Now, if two tables have ratio=5 and one table
has been configured to have a very fast vacuum, that is limit=1,
then ratio for that table is 1/20=500.  Therefore that table should
be configured, after rebalance, to have a limit and delay that are 100
times faster than the settings for the other two tables.  (And there is
a further constraint that the total delay per "limit unit" should be
so-and-so to accomodate getting the correct total delay per limit unit.)

I haven't thought about how to code that, but I don't think it should be
too difficult.  Want to give it a try?  I think it makes sense to modify
both the running delay and the running limit to achieve whatever ratio
we come up with, except that delay should probably not go below 10ms
because, apparently, some platforms have that much of sleep granularity
and it wouldn't really work to have a smaller delay.

Am I making sense?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [SQL] Comparison semantics of CHAR data type

2014-02-14 Thread Bruce Momjian
On Thu, Feb 13, 2014 at 09:47:01PM -0500, Bruce Momjian wrote:
> On Wed, Oct 16, 2013 at 02:17:11PM -0400, Bruce Momjian wrote:
> > > > You can see the UTF8 case is fine because \n is considered greater
> > > > than space, but in the C locale, where \n is less than space, the
> > > > false return value shows the problem with
> > > > internal_bpchar_pattern_compare() trimming the string and first
> > > > comparing on lengths.  This is exactly the problem you outline, where
> > > > space trimming assumes everything is less than a space.
> > > 
> > > For collations other than C some of those issues that have to do with
> > > string comparisons might simply be hidden, depending on how strcoll()
> > > handles inputs off different lengths: If strcoll() applies implicit
> > > space padding to the shorter value, there won't be any visible
> > > difference in ordering between bpchar and varchar values.  If strcoll()
> > > does not apply such space padding, the right-trimming of bpchar values
> > > causes very similar issues even in a en_US collation.
> 
> I have added the attached C comment to explain the problem, and added a
> TODO item to fix it if we ever break binary upgrading.
> 
> Does anyone think this warrants a doc mention?

I have done some more thinking on this and I found a way to document
this, which reduces our need to actually fix it some day.  I am afraid
the behavioral change needed to fix this might break so many
applications that the fix will never be done, though I will keep the
TODO item until I get more feedback on that.  Patch attached.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 30fd9bb..9635004
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** SELECT '52093.89'::money::numeric::float
*** 1072,1081 
 
  Values of type character are physically padded
  with spaces to the specified width n, and are
! stored and displayed that way.  However, the padding spaces are
! treated as semantically insignificant.  Trailing spaces are
! disregarded when comparing two values of type character,
! and they will be removed when converting a character value
  to one of the other string types.  Note that trailing spaces
  are semantically significant in
  character varying and text values, and
--- 1072,1084 
 
  Values of type character are physically padded
  with spaces to the specified width n, and are
! stored and displayed that way.  However, trailing spaces are treated as
! semantically insignificant and disregarded when comparing two values
! of type character.  In collations where whitespace
! is significant, this behavior can produce unexpected results,
! e.g. SELECT 'a '::CHAR(2) collate "C" < 'a\n'::CHAR(2)
! returns true.
! Trailing spaces are removed when converting a character value
  to one of the other string types.  Note that trailing spaces
  are semantically significant in
  character varying and text values, and
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
new file mode 100644
index 284b5d1..502ca44
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
*** bpcharcmp(PG_FUNCTION_ARGS)
*** 846,863 
  len2;
  	int			cmp;
  
- 	/*
- 	 * Trimming trailing spaces off of both strings can cause a string
- 	 * with a character less than a space to compare greater than a
- 	 * space-extended string, e.g. this returns false:
- 	 *		SELECT E'ab\n'::CHAR(10) < E'ab '::CHAR(10);
- 	 * even though '\n' is less than the space if CHAR(10) was
- 	 * space-extended.  The correct solution would be to trim only
- 	 * the longer string to be the same length of the shorter, if
- 	 * possible, then do the comparison.  However, changing this
- 	 * might break existing indexes, breaking binary upgrades.
- 	 * For details, see http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV=z...@mail.gmail.com
- 	 */
  	len1 = bcTruelen(arg1);
  	len2 = bcTruelen(arg2);
  
--- 846,851 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
I think I’m gonna need to dig into the planner to fully understand your points. 
Thank you for the insights. I was more into putting the knowledge of the legacy 
system into the an extension and my codebase. Now I see better use of the 
planner would help. Thank you.

What inspired me is the scriptable query rewrite in 
http://dev.mysql.com/downloads/mysql-proxy/ 
The hook I proposed would be a lot nicer in Postgres because the raw parsing is 
already done at this point while in mysql-proxy that has to be done manually.

Another point I liked in mysql is the possibility to write info schema plugins: 
http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
Like a virtual catalog. Is there anything similar in Postgres?

Thank you, David


2014.02.14. dátummal, 18:06 időpontban Greg Stark  írta:

> On Fri, Feb 14, 2014 at 2:28 PM, David Beck  wrote:
>> Why is that a bad idea of rewriting the query before it reaches 
>> transform/analyze (without ever accessing the catalog)?
>> 
>> If that flexibility is acceptable to you, where would be the best place to 
>> put it in?
> 
> Well if there are two foreign tables and the planner could push the
> join work down to the fdw then the planner should be able to
> accurately represent that plan and cost it without having the user
> have to create any catalog structures. That's what the planner does
> for every other type of plan node.
> 
> What you're describing would still be useful for materialized views.
> In that case the user is creating the materialized view and it is a
> real thing in the catalogs that won't disappear on the planner. Even
> then it would be ideal if the planner could decide to use the
> materialized view late enough that it can actually determine if it's
> superior rather than rewriting the query before it gets to that point.
> That would be much more flexible for users too who might not write the
> query in a way that exactly matches the materialized view.
> 
> -- 
> greg



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-14 Thread Greg Stark
Going over this I think this is still a potential issue:

On 31 Jan 2014 15:56, "Andres Freund"  wrote:

>
> I am not sure that explains the issue, but I think the redo action for
> truncation is not safe across crashes.  A XLOG_SMGR_TRUNCATE will just
> do a smgrtruncate() (and then mdtruncate) which will iterate over the
> segments starting at 0 till mdnblocks()/segment_size and *truncate* but
> not delete individual segment files that are not needed anymore, right?
> If we crash in the midst of that a new mdtruncate() will be issued, but
> it will get a shorter value back from mdnblocks().
>
> Am I missing something?
>

I'm not too familiar with md.c but my reading of the code is that we
truncate the files in reverse order? In which case I think the code is safe
*iff* the filesystem guarantees ordered meta data writes which I tihnk ext3
does (I think in all the journal modes). Most filesystems meta data writes
are synchronous so the truncates are safe for them too.

But we don't generally rely on meta data writes being ordered. I think the
"correct" thing to do is to record the nblocks prior to the truncate and
then have md.c expose a new function that takes that parameter and pokes
around looking for any segments it might need to clean up. But that would
involve lots of abstraction violations in md.c. I think using nblocks would
keep the violations within md.c but that still seems like a pain.


Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Alvaro Herrera
Andres Freund wrote:
> On 2014-02-14 13:33:46 -0500, Josh Berkus wrote:
> > On 02/14/2014 01:11 PM, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
> > >> Do the 9.3.3 replication fixes mean that users should reclone their
> > >> replicas, like 9.3.2 did?  Or not?
> > > 
> > > Which replication replication fixes are you referring to?
> > > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
> > > ?
> > > If so, no, that doesn't require a reclone.
> > 
> > Hmmm.  I thought there were also 9.3-only replication fixes in this
> > release?  No?
> 
> I don't know any. There's further multixact fixes but AFAIK there's
> nothing replication specific, and they shouldn't cause problems but lost
> row locks in some edge cases.

There is one issue that might cause foreign keys to go unchecked.  In
cases where applications update referenced tuples and then delete them
in the same transaction, it might be wise to recheck foreign keys.  This
is the relevant commit:

commit db1014bc46de21a6de1751b807ea084e607104ad
Author: Alvaro Herrera 
Date:   Wed Dec 18 13:31:27 2013 -0300

Don't ignore tuple locks propagated by our updates

If a tuple was locked by transaction A, and transaction B updated it,
the new version of the tuple created by B would be locked by A, yet
visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the
lock held by A wouldn't get checked if transaction B later deleted (or
key-updated) the new version of the tuple.  This might cause referential
integrity checks to give false positives (that is, allow deletes that
should have been rejected).

This is an easy oversight to have made, because prior to improved tuple
locks in commit 0ac5ad5134f it wasn't possible to have tuples created by
our own transaction that were also locked by remote transactions, and so
locks weren't even considered in that code path.

It is recommended that foreign keys be rechecked manually in bulk after
installing this update, in case some referenced rows are missing with
some referencing row remaining.

Per bug reported by Daniel Wood in
CAPweHKe5QQ1747X2c0tA=5zf4yns2xcvgf13opd-1mq24rf...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Jeff Janes
On Fri, Feb 14, 2014 at 6:33 AM, Bruce Momjian  wrote:

> On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
> > Bruce,
>
> > Having @include and directory.d-style capabilities for pg_hba.conf *and*
> > pg_ident.conf would make managing larger environments much better.
> > There has been some talk about providing those capabilities via tables
> > in the catalog, but I'm not aware of anyone working on it and it'd
> > certainly be quite a bit more work than adding include/dir.d options.
>
> Do we want a TODO for this?
>

If we are assembling a wish-list, I've often wanted the opposite of an
include.  I want the ability to encapsulate the contents of pg_hba.conf
directly into postgresql.conf.  So instead of giving a filename to
hba_file, optionally give a multi-lined string with some kind of here-doc
like mechanism, or something like that.

When I set up a forked dev environment and then eventually want to compare
the diverged dev setup back to production, I often forget to compare the
pg_hba.conf file.

Cheers,

Jeff


Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Andres Freund
On 2014-02-14 13:33:46 -0500, Josh Berkus wrote:
> On 02/14/2014 01:11 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
> >> Do the 9.3.3 replication fixes mean that users should reclone their
> >> replicas, like 9.3.2 did?  Or not?
> > 
> > Which replication replication fixes are you referring to?
> > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
> > ?
> > If so, no, that doesn't require a reclone.
> 
> Hmmm.  I thought there were also 9.3-only replication fixes in this
> release?  No?

I don't know any. There's further multixact fixes but AFAIK there's
nothing replication specific, and they shouldn't cause problems but lost
row locks in some edge cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Josh Berkus
On 02/14/2014 01:11 PM, Andres Freund wrote:
> Hi,
> 
> On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
>> Do the 9.3.3 replication fixes mean that users should reclone their
>> replicas, like 9.3.2 did?  Or not?
> 
> Which replication replication fixes are you referring to?
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
> ?
> If so, no, that doesn't require a reclone.

Hmmm.  I thought there were also 9.3-only replication fixes in this
release?  No?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-14 18:49:33 +0100, Florian Pflug wrote:
> > I wouldn't consider it a major architecture... And I am not sure how
> > much out of order a CPU has to be to be affected by this, the read side
> > uses spinlocks, which in most of the spinlock implementations implies a
> > full memory barrier which in many cache coherency designs will cause
> > other CPUs to flush writes. And I think the control dependency in the
> > PGSemaphoreUnlock() loop will actually cause a flush on ppc...
> 
> I guess it's sufficient for the store to lwWaitLink to be delayed.
> As long as the CPU on which that store is executing hasn't managed to gain
> exclusive access to the relevant cache line, memory barriers on the read
> side won't help because the store won't be visible to other CPUs.

The whole lwlock actually should be on the same cacheline on anything
with cachelines >= 32. As the woken up side is doing an atomic op on it
*before* modifying lwWaitLink I think we are actually guaranteed that
any incomplete store on the writer will have completed unless the compiler
reordered. So we are safe against out of order stores, but not out of
order execution. That might have been what prevented the issue from
being noticable on some platforms.

> Well, the assumption isn't all that new. We already have the situation that
> a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
> Currently, the process who took it off the queue is responsible for rectifying
> that eventually, with the changed proposed below it'll be the backend who
> owns the PGPROC. From the point of view of other backends, nothing really
> changes.

That window is absolutely tiny tho.

> >> b) resetting lwWaitLink introduces a race condition (however unlikely)
> >> 
> >> we'll trade correctness for cleanliness if we continue to reset lwWaitLink
> >> without protecting against the race. That's a bit of a weird trade-off to 
> >> make.
> > 
> > It's not just cleanliness, it's being able to actually debug crashes.
> 
> We could still arrange for the stray lwWaitLink from being visible only
> momentarily. If a backend is woken up and detects that lwWaiting is false,
> it knows that it cannot be on any wait queue - it was just removed, and
> hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
> back to NULL. The stray value would thus only be visible while a backend 
> executes
> the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
> from a stack trace. So I'm not convinced that this makes debugging any harder.

That's not actually safe on an out of order architecture afaics. Without
barriers the store to lwWaitLink in the woken up backend can preempt the
read for the next element in the PGSemaphoreUnlock() loop.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Andres Freund
Hi,

On 2014-02-14 13:08:34 -0500, Josh Berkus wrote:
> Do the 9.3.3 replication fixes mean that users should reclone their
> replicas, like 9.3.2 did?  Or not?

Which replication replication fixes are you referring to?
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ebde6c40148c9f811f7c6d35f67e7ea3ce2d9b34
?
If so, no, that doesn't require a reclone.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release schedule for 9.3.3?

2014-02-14 Thread Josh Berkus
All,

Do the 9.3.3 replication fixes mean that users should reclone their
replicas, like 9.3.2 did?  Or not?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] git-review: linking commits to review discussion in git

2014-02-14 Thread Peter C Rigby
On 2014-01-28 07:10:45 Heikki Linnakangas  wrote:
> On 01/27/2014 11:36 PM, Murtuza Mukadam wrote:
>>
>> Hello All,
>>We have linked peer review discussions on
>> 'pgsql-hackers' to their respective commits within the main
>> postgresql.git repository. You can view the linked reviews from 2012
>> until present in the GitHub repo at
>> https://github.com/mmukadam/postgres/tree/review
[snip]
> I don't understand what this does. The repository at
> https://github.com/mmukadam/postgres looks like just a clone of the main
> PostgreSQL repository, with no extra links anywhere. And the repository at
> https://github.com/mmukadam/postgres/tree/review looks like a mailing list
> archive turned into a git repository, but I don't see any links to the
> commits in the main repository there.
>
> Am I missing something?

If you want to search for the reviews related to a commit, you can
type in the full hash code here:
http://users.encs.concordia.ca/~m_mukada/git-review-tracker.html

or append the commit hash to the following url:

https://github.com/mmukadam/postgres/tree/review/


It's a bit hard to view the reviews on GitHub because the 'review' branch was
designed to be viewed on the command line using a set of simple scripts that
sit on top of the standard git commands.
For example,

to view all reviews Heikki is involved in you can do: git review
--log-reviewer heikki

to view the reviews of particular commit: git review --log
1a1832eb085e5bca198735e5d0e766a3cb61b8fc

You can also create new reviews on a commit and mail them as well as
import reviews from email.

The tool is beta, but it just creates a detached review branch, so it
won't affect master. To remove it all you do is delete the scripts and
the 'review' branch. More info available here:

Online man page: http://users.encs.concordia.ca/~m_mukada/git-review.html

Tutorial: http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html

Installation: 
http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html#Install

Cheers,
Peter

PS this is part of Murtuza's thesis, so any feedback is appreciated.
If you'd like us to extract reviews from another mailing list or repo,
please let us know


Example output:

git review --log-reviewer heikki

Review: adfe85d4dd1844c1f09401d2d970f564d0dac61d
Commit Reviewed: 593a9631a7947ab95903e87e24786d7e469cc988
Author: Tom Lane 
AuthorDate: Tue Feb 21 15:03:36 2012 -0500
Reviewer:   Heikki Linnakangas hlinnakan...@vmware.com 
ReviewDate: Mon Jan 27 16:34:44 2014 +0200


Re: Race condition in b-tree page deletion

Review: e7bfe4323d2b83605df83174f11785c117ad3a53
Commit Reviewed: 56a57473a999b0497e63bde3e303beda5a3c0ff3
Author: Tom Lane 
AuthorDate: Sat Jan 8 14:47:13 2011 -0500
Reviewer:   Heikki Linnakangas hlinnakan...@vmware.com 
ReviewDate: Sat Jan 25 23:21:20 2014 +0200


Re: GIN improvements part2: fast scan



-- 
http://users.encs.concordia.ca/~pcr/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 16:51 , Andres Freund  wrote:
> On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:
>> On Feb14, 2014, at 14:07 , Andres Freund  wrote:
>>> On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
> I agree we should do that, but imo not in the backbranches. Anything
> more than than the minimal fix in that code should be avoided in the
> stable branches, this stuff is friggin performance sensitive, and the
> spinlock already is a *major* contention point in many workloads.
 
 No argument there. But unless I missed something, there actually is a bug
 there, and using volatile won't fix it. A barrier would, but what about the
 back branches that don't have barrier.h?
>>> 
>>> Yea, but I don't see a better alternative. Realistically the likelihood
>>> of a problem without the compiler reordering stuff is miniscule on any
>>> relevant platform that's supported by the !barrier.h branches. Newer
>>> ARMs are the only realistic suspect, and the support for in older
>>> releases wasn't so good...
>> 
>> Isn't POWER/PowerPC potentially affected by this? 
> 
> I wouldn't consider it a major architecture... And I am not sure how
> much out of order a CPU has to be to be affected by this, the read side
> uses spinlocks, which in most of the spinlock implementations implies a
> full memory barrier which in many cache coherency designs will cause
> other CPUs to flush writes. And I think the control dependency in the
> PGSemaphoreUnlock() loop will actually cause a flush on ppc...

I guess it's sufficient for the store to lwWaitLink to be delayed.
As long as the CPU on which that store is executing hasn't managed to gain
exclusive access to the relevant cache line, memory barriers on the read
side won't help because the store won't be visible to other CPUs.

>> Also, there is still the alternative of not resetting lwWaitLink in 
>> LWLockRelease.
>> I can understand why you oppose that - it's certainly nicer to not have stray
>> pointer lying around. But since (as least as far as we know)
>> 
>> a) resetting lwWaitLink is not strictly necessary
> 
> I don't want to rely on that in the backbranches, that's an entirely new
> assumption. I think anything more than minimal changes are hard to
> justify for a theoretical issue that hasn't been observed in the field.
> 
> I think the biggest danger here is that writes are reordered by the
> compiler, that we definitely need to protect against. Making a variable
> volatile or introducing a memory barrier is pretty simple to analyze.

Well, the assumption isn't all that new. We already have the situation that
a PGPROC may be not on any wait queue, yet its lwWaitLink may be non-NULL.
Currently, the process who took it off the queue is responsible for rectifying
that eventually, with the changed proposed below it'll be the backend who
owns the PGPROC. From the point of view of other backends, nothing really
changes.

> 
>> b) resetting lwWaitLink introduces a race condition (however unlikely)
>> 
>> we'll trade correctness for cleanliness if we continue to reset lwWaitLink
>> without protecting against the race. That's a bit of a weird trade-off to 
>> make.
> 
> It's not just cleanliness, it's being able to actually debug crashes.

We could still arrange for the stray lwWaitLink from being visible only
momentarily. If a backend is woken up and detects that lwWaiting is false,
it knows that it cannot be on any wait queue - it was just removed, and
hasn't added itself again yet. At that point, it's safe to reset lwWaitLink
back to NULL. The stray value would thus only be visible while a backend 
executes
the PGSemaphoreLock() loop, and whether or not this is the case can be deduced
from a stack trace. So I'm not convinced that this makes debugging any harder.

(knizhnik has just suggested the same)

It's interesting, BTW, that updating lwWaitLink after lwWaiting is OK here -
the crucial property is not that it's updated before lwWaiting, but rather that
it is reset before enqueuing the backend again. Which is something that backend
itself can guarantee far more easily than whoever happens to de-queue it due to
spurious wakeups.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Jerry Sievers
Tom Lane  writes:

> Stephen Frost  writes:
>
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>>> In short: I suspect this approach may be fixing the wrong thing.
>
>> I'm curious what you're thinking would be the right thing to fix here?
>
> I was asking for use-cases so we could figure out what's the right thing ;-)

No doubt this environment I took over managing, of 90 clusters ranging
from tiny to ~3.5TB, some with hundreds of user accounts...  could stand
some massive rejiggering in regards to users/rols and such to go towards
doing it "the right way".

That said, trying to roll up the hba files some with over 300 entries
and lots of cases of high duplication among clusters belonging to somne
subset, would be daunting and perhaps invasive.

I realize that gathering up those duplicates into a file common to
whatever group and then having that pulled in as an include is going to
result  in some reordering of the rules  since they are not in a totally
predictable order  presently

And my Jr. DBAs are still handling requests daily to add more hba rules
quite often to more than a single machine but still along the same
groupings mentioned.

Even without retrofitting a "good" cleanup here, being able to include a
global section at top of all files and at least one group specific file
next then followed by legacy entries and/or items specific to a single
cluster, might save a lot of work.

Thus my idea of using a make file for this and then inspiring the
question about includes.

Great just getting the ball rolling and I respect whatever the concensus
opinion that emerges.

Thx

>
> The argument about wanting to assemble a pg_hba file from separately
> managed configuration pieces seems to have some merit, but the weak
> spot there is how do you define the search order?  Or are you planning
> to just cross your fingers and hope it doesn't matter too much?
>
>   regards, tom lane
>

-- 
Jerry Sievers
e: jerry.siev...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Tom Lane
I wrote:
> Hiroshi Inoue  writes:
>> One thing I'm wondering about is that plperl is linking perlxx.lib
>> not libperlxx.a. I made a patch following plpython and it also
>> works here.
>> Is it worth trying?

> I hadn't noticed that part of plpython's Makefile before.  Man,
> that's an ugly technique :-(.  Still, there's little about this
> platform that isn't ugly.  Let's try it and see what happens.

And what happens is this:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=narwhal&dt=2014-02-14%2017%3A00%3A02
namely, it gets through plperl now and then chokes with the same
symptoms on pltcl.  So I guess we need the same hack in pltcl.
The fun never stops ...

(BTW, narwhal is evidently not trying to build plpython.  I wonder
why not?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-02-14 11:10:48 -0500, Tom Lane wrote:
> > Stephen Frost  writes:
> > > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > >> In short: I suspect this approach may be fixing the wrong thing.
> > 
> > > I'm curious what you're thinking would be the right thing to fix here?
> > 
> > I was asking for use-cases so we could figure out what's the right thing ;-)

The specific use-cases that I've seen are primairly around managing the
sets of users who are allowed to connect from specific subnets.  As for
how the files might be independently managed, as Andres pointed out, all
systems would get a 'default' set which includes access for the local
'postgres' unix user, and then certain sets of systems would get
appropriate combinations of:

system-specific admin users (from anywhere)
normal users for that environment (from the desktop network)
application users for app servers (from the app servers)
ETL users for batch systems (from the ETL servers)
sales users for the *replica* (whose CONNECT privs can't be different..)

etc..  Would it be possible to have those all in one pg_hba file?  Sure,
in most cases, and you'd trust that the accounts simply wouldn't exist
or that they wouldn't have CONNECT privs on the database, but this isn't
how I run my systems (entries in pg_hba for users that don't exist?) and
I doubt I'm alone in that.  Being able to enforce based on source-IP is
*very* nice and is only available in pg_hba today.  Also, as noted
above, if you're running a replica that should have different access
rights (in particular, more 'open') then you can't depend on CONNECT
privs or account authentication (username/password differences) to
control access since that's all in the catalog and therefore must be
identical to the primary.

> > The argument about wanting to assemble a pg_hba file from separately
> > managed configuration pieces seems to have some merit, but the weak
> > spot there is how do you define the search order?  Or are you planning
> > to just cross your fingers and hope it doesn't matter too much?
> 
> The usual solution is to prepend a numeric prefix guaranteeing the
> search order. 00 is sysadmin stuff, 10 replication, 20 database specific
> or somesuch. I think most admins using automated tools to manage bigger
> configuration files by using some .d config directory already know how
> to deal with that problem.

Indeed, and run-parts codified it as 'lexical sort order (according to
the C/POSIX locale)', under at least Debian and Ubuntu systems.  I'm
pretty confident that most admins are going to be familiar with handling
this issue as it's exactly how /etc/cron.daily and anything else called
through run-parts works.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 2:28 PM, David Beck  wrote:
> Why is that a bad idea of rewriting the query before it reaches 
> transform/analyze (without ever accessing the catalog)?
>
> If that flexibility is acceptable to you, where would be the best place to 
> put it in?

Well if there are two foreign tables and the planner could push the
join work down to the fdw then the planner should be able to
accurately represent that plan and cost it without having the user
have to create any catalog structures. That's what the planner does
for every other type of plan node.

What you're describing would still be useful for materialized views.
In that case the user is creating the materialized view and it is a
real thing in the catalogs that won't disappear on the planner. Even
then it would be ideal if the planner could decide to use the
materialized view late enough that it can actually determine if it's
superior rather than rewriting the query before it gets to that point.
That would be much more flexible for users too who might not write the
query in a way that exactly matches the materialized view.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread knizhnik

On 02/14/2014 08:28 PM, Andres Freund wrote:

On 2014-02-14 20:23:32 +0400, knizhnik wrote:

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.


It's not just cleanliness, it's being able to actually debug crashes.



Frankly speaking I do not understand why elimination of resetting of lwWaitLink 
was considered to be bad idea.
Resetting this pointer to NULL will not help much to analyze crash dumps, 
because right now it is possible that lwWaitLink==NULL but process in waiting 
for a lock and linked in the list
(if it is the last element of the list). So the fact that  lwWaitLink==NULL 
actually gives not so much useful information.


At the moment if you connect to a live pg or a crash dump, investigating
the wait links allows you to somewhat sensibly determine which backends
are waiting for a lock and whether they are part of a queue. If they
aren't reset anymore that will tell you nothing, so you'll need to
connect to all pg instances to debug issues.


Why it is not enough to inspect lwWaiting flag?
In any case, resetting lwWaitLink can be safely done in awakened process:


if (!proc->lwWaiting) {
>>>proc->lwWaitLink = NULL;
break;
}





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-14 20:23:32 +0400, knizhnik wrote:
> >>we'll trade correctness for cleanliness if we continue to reset lwWaitLink
> >>without protecting against the race. That's a bit of a weird trade-off to 
> >>make.
> >
> >It's not just cleanliness, it's being able to actually debug crashes.
> 
> 
> Frankly speaking I do not understand why elimination of resetting of 
> lwWaitLink was considered to be bad idea.
> Resetting this pointer to NULL will not help much to analyze crash dumps, 
> because right now it is possible that lwWaitLink==NULL but process in waiting 
> for a lock and linked in the list
> (if it is the last element of the list). So the fact that  lwWaitLink==NULL 
> actually gives not so much useful information.

At the moment if you connect to a live pg or a crash dump, investigating
the wait links allows you to somewhat sensibly determine which backends
are waiting for a lock and whether they are part of a queue. If they
aren't reset anymore that will tell you nothing, so you'll need to
connect to all pg instances to debug issues.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread knizhnik

On 02/14/2014 07:51 PM, Andres Freund wrote:

On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:

On Feb14, 2014, at 14:07 , Andres Freund  wrote:

On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:

I agree we should do that, but imo not in the backbranches. Anything
more than than the minimal fix in that code should be avoided in the
stable branches, this stuff is friggin performance sensitive, and the
spinlock already is a *major* contention point in many workloads.


No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h?


Yea, but I don't see a better alternative. Realistically the likelihood
of a problem without the compiler reordering stuff is miniscule on any
relevant platform that's supported by the !barrier.h branches. Newer
ARMs are the only realistic suspect, and the support for in older
releases wasn't so good...


Isn't POWER/PowerPC potentially affected by this?


I wouldn't consider it a major architecture... And I am not sure how
much out of order a CPU has to be to be affected by this, the read side
uses spinlocks, which in most of the spinlock implementations implies a
full memory barrier which in many cache coherency designs will cause
other CPUs to flush writes. And I think the control dependency in the
PGSemaphoreUnlock() loop will actually cause a flush on ppc...


PGSemaphoreUnlock() should really introduce memory barrier, but the problem can 
happen before PGSemaphoreUnlock() is called.
So presence of PGSemaphoreUnlock() just reduces probability of race condition 
on non-TSO architectures (PPC, ARM, IA64,...) but doesn't completely eliminate 
it.




Also, there is still the alternative of not resetting lwWaitLink in 
LWLockRelease.
I can understand why you oppose that - it's certainly nicer to not have stray
pointer lying around. But since (as least as far as we know)

a) resetting lwWaitLink is not strictly necessary


I don't want to rely on that in the backbranches, that's an entirely new
assumption. I think anything more than minimal changes are hard to
justify for a theoretical issue that hasn't been observed in the field.

I think the biggest danger here is that writes are reordered by the
compiler, that we definitely need to protect against. Making a variable
volatile or introducing a memory barrier is pretty simple to analyze.


b) resetting lwWaitLink introduces a race condition (however unlikely)

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.


It's not just cleanliness, it's being able to actually debug crashes.



Frankly speaking I do not understand why elimination of resetting of lwWaitLink 
was considered to be bad idea.
Resetting this pointer to NULL will not help much to analyze crash dumps, 
because right now it is possible that lwWaitLink==NULL but process in waiting 
for a lock and linked in the list
(if it is the last element of the list). So the fact that  lwWaitLink==NULL 
actually gives not so much useful information.





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Andres Freund
On 2014-02-14 11:10:48 -0500, Tom Lane wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> In short: I suspect this approach may be fixing the wrong thing.
> 
> > I'm curious what you're thinking would be the right thing to fix here?
> 
> I was asking for use-cases so we could figure out what's the right thing ;-)
> 
> The argument about wanting to assemble a pg_hba file from separately
> managed configuration pieces seems to have some merit, but the weak
> spot there is how do you define the search order?  Or are you planning
> to just cross your fingers and hope it doesn't matter too much?

The usual solution is to prepend a numeric prefix guaranteeing the
search order. 00 is sysadmin stuff, 10 replication, 20 database specific
or somesuch. I think most admins using automated tools to manage bigger
configuration files by using some .d config directory already know how
to deal with that problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> In short: I suspect this approach may be fixing the wrong thing.

> I'm curious what you're thinking would be the right thing to fix here?

I was asking for use-cases so we could figure out what's the right thing ;-)

The argument about wanting to assemble a pg_hba file from separately
managed configuration pieces seems to have some merit, but the weak
spot there is how do you define the search order?  Or are you planning
to just cross your fingers and hope it doesn't matter too much?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
> On 2014-02-14 11:03:19 -0500, Stephen Frost wrote:
> > Also, all of the above ignores the pg_ident side of the house, which is
> > even worse as you need an entry for every user, period, if you're using
> > client-side SSL certificates or Kerberos/GSSAPI-based authentication
> > with full princ names.
> 
> Well, there's regexes for mappings, that can often enough take care of
> most of that?

In some cases, yes, but certainly not all.  Apologies for over-stating
the case, but I came from an environment where the Kerberos princs were
'm##', while the database users were all first-initial || last-name.
Also, the CN in an SSL certificate isn't likely to be what you want for
a username either, and a regexp isn't likely to help that either.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Andres Freund
On 2014-02-14 11:03:19 -0500, Stephen Frost wrote:
> Also, all of the above ignores the pg_ident side of the house, which is
> even worse as you need an entry for every user, period, if you're using
> client-side SSL certificates or Kerberos/GSSAPI-based authentication
> with full princ names.

Well, there's regexes for mappings, that can often enough take care of
most of that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Having @include and directory.d-style capabilities for pg_hba.conf *and*
> > pg_ident.conf would make managing larger environments much better.
> 
> I'm a little suspicious of this, mainly because pg_hba searching is
> necessarily linear (and none too cheap per-entry).  I think anyone
> who tries to use a set of entries large enough to really need multiple
> files is going to have pain.

It would depend on the connection rate- some of these environments have
low connection rate but quite a few users (eg: corporate database
server).  As expressed up-thread, and in my own experience, people are
already doing this by building up the files themselves, so I don't agree
that performance would be so bad that this isn't worth it.  Not everyone
is using PG to back their website.  Improving performance would always
be good too, or perhaps providing another mechanism which performs
better when there are more entries.

> We already have various methods for making one pg_hba entry do the
> work of many; for instance, IP-subnet entries, wildcards, and role
> references.  And you can use database CONNECT privilege grants as
> another substitute for fine-grained pg_hba entries.

Yes, but those entries all have to agree on the exact authentication
method, including any mappings or other options which are passed through
pg_hba.conf.  Also, CONNECT is only true/false, it doesn't have any
ability to consider source-IP, SSL, etc.

> I'd be interested to see a real use-case where those things aren't
> an adequate substitute for a pg_hba rule set that's too large to
> fit conveniently in one file.  Maybe we could identify another
> pg_hba abstraction technique we need to support.

Being able to 'fit' inside of one file is wholly in the eye of the
beholder and isn't terribly relevant either.  Larger installations will
be using puppet, chef, or one of the other configuration management
systems and will want to use even small/independent files rather than
having to write code which fools around with the one file approach.  The
files I have in /etc/cron.d/ aren't particularly large, but it's really
nice to be able to independently manage and deploy them without
templating them.

Also, all of the above ignores the pg_ident side of the house, which is
even worse as you need an entry for every user, period, if you're using
client-side SSL certificates or Kerberos/GSSAPI-based authentication
with full princ names.

> In short: I suspect this approach may be fixing the wrong thing.

I'm curious what you're thinking would be the right thing to fix here?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-14 15:03:16 +0100, Florian Pflug wrote:
> On Feb14, 2014, at 14:07 , Andres Freund  wrote:
> > On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
> >>> I agree we should do that, but imo not in the backbranches. Anything
> >>> more than than the minimal fix in that code should be avoided in the
> >>> stable branches, this stuff is friggin performance sensitive, and the
> >>> spinlock already is a *major* contention point in many workloads.
> >> 
> >> No argument there. But unless I missed something, there actually is a bug
> >> there, and using volatile won't fix it. A barrier would, but what about the
> >> back branches that don't have barrier.h?
> > 
> > Yea, but I don't see a better alternative. Realistically the likelihood
> > of a problem without the compiler reordering stuff is miniscule on any
> > relevant platform that's supported by the !barrier.h branches. Newer
> > ARMs are the only realistic suspect, and the support for in older
> > releases wasn't so good...
> 
> Isn't POWER/PowerPC potentially affected by this? 

I wouldn't consider it a major architecture... And I am not sure how
much out of order a CPU has to be to be affected by this, the read side
uses spinlocks, which in most of the spinlock implementations implies a
full memory barrier which in many cache coherency designs will cause
other CPUs to flush writes. And I think the control dependency in the
PGSemaphoreUnlock() loop will actually cause a flush on ppc...

> Also, there is still the alternative of not resetting lwWaitLink in 
> LWLockRelease.
> I can understand why you oppose that - it's certainly nicer to not have stray
> pointer lying around. But since (as least as far as we know)
> 
> a) resetting lwWaitLink is not strictly necessary

I don't want to rely on that in the backbranches, that's an entirely new
assumption. I think anything more than minimal changes are hard to
justify for a theoretical issue that hasn't been observed in the field.

I think the biggest danger here is that writes are reordered by the
compiler, that we definitely need to protect against. Making a variable
volatile or introducing a memory barrier is pretty simple to analyze.

> b) resetting lwWaitLink introduces a race condition (however unlikely)
> 
> we'll trade correctness for cleanliness if we continue to reset lwWaitLink
> without protecting against the race. That's a bit of a weird trade-off to 
> make.

It's not just cleanliness, it's being able to actually debug crashes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Tom Lane
Hiroshi Inoue  writes:
> (2014/02/12 8:30), Tom Lane wrote:
>> Not very clear what's going on there; could this be a problem in
>> narwhal's admittedly-ancient toolchain?

> The error doesn't occur here (un)fortunately.
> One thing I'm wondering about is that plperl is linking perlxx.lib
> not libperlxx.a. I made a patch following plpython and it also
> works here.
> Is it worth trying?

I hadn't noticed that part of plpython's Makefile before.  Man,
that's an ugly technique :-(.  Still, there's little about this
platform that isn't ugly.  Let's try it and see what happens.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 16:32 , Andres Freund  wrote:
> On 2014-02-14 10:26:07 -0500, Tom Lane wrote:
>> Florian Pflug  writes:
>>> Another idea for a fix would be to conflate lwWaiting and lwWaitLink into 
>>> one
>>> field. We could replace "lwWaiting" by "lwWaitLink != NULL" everywhere it's
>>> tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
>>> enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to 
>>> true.
>> 
>>> We'd then depend on pointer-sized stores being atomic, which I think we 
>>> depend
>>> on in other places already.
>> 
>> I don't believe that's true; neither that we depend on it now, nor that
>> it would be safe to do so.
> 
> Yea, we currently rely on 4 byte stores being atomic (most notably for
> xids), but we don't rely on anything bigger. I am not sure if there are
> architectures with 64bit pointers that aren't accessed atomically when
> aligned? Even if, that's certainly nothing that should be introduced
> when backpatching.

Hm, we could use 4-byte stores instead of 8-byte stores if we turned lwWaitLink
into an index into the proc array instead of a pointer to the PGPROC struct.

We could then use 0x instead place of NULL to indicate "not waiting",
and PROCARRAY_MAXPROCS to indicate "waiting, but no successor in the queue".

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-14 10:26:07 -0500, Tom Lane wrote:
> Florian Pflug  writes:
> > Another idea for a fix would be to conflate lwWaiting and lwWaitLink into 
> > one
> > field. We could replace "lwWaiting" by "lwWaitLink != NULL" everywhere it's
> > tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
> > enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to 
> > true.
> 
> > We'd then depend on pointer-sized stores being atomic, which I think we 
> > depend
> > on in other places already.
> 
> I don't believe that's true; neither that we depend on it now, nor that
> it would be safe to do so.

Yea, we currently rely on 4 byte stores being atomic (most notably for
xids), but we don't rely on anything bigger. I am not sure if there are
architectures with 64bit pointers that aren't accessed atomically when
aligned? Even if, that's certainly nothing that should be introduced
when backpatching.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Andres Freund
On 2014-02-14 10:19:30 -0500, Tom Lane wrote:
> Stephen Frost  writes:
> > Having @include and directory.d-style capabilities for pg_hba.conf *and*
> > pg_ident.conf would make managing larger environments much better.
> 
> I'm a little suspicious of this, mainly because pg_hba searching is
> necessarily linear (and none too cheap per-entry).  I think anyone
> who tries to use a set of entries large enough to really need multiple
> files is going to have pain.

I don't think big hba files should be the primary motivator for a
feature like this. What I have seen is that people want a couple entries
that grant local access to maintenance scripts and
root/postgres. Additionally they want to have a *few* for every
application that runs on the same cluster and another additional couple
of entries allowing access to a limited number of replicas.

It's not too hard to script this, simply have a pg_bha.d directory whose
contents are concatenated in an alphabetic order resulting in a combined
file. But that's clearly inferior to builtin support for that because
the error reporting obviously won't be as good and it's not possible to
nicely hook this into reloads and such.

I wonder if the linear search could be improved a bit by separating
database specific entries from database independent ones as the setups
with large number of entries mostly had lots of database specific
rules. It would probably be nontrivial to guarantee the sequential
lookup order, but it should be possible.

> I'd be interested to see a real use-case where those things aren't
> an adequate substitute for a pg_hba rule set that's too large to
> fit conveniently in one file.  Maybe we could identify another
> pg_hba abstraction technique we need to support.

That additionally very well might be. The biggest thing I have seen is
a) the inability to restrict access based on outside group membership
when using ldap, b) the inability to grant membership in groups from
outside of pg.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Tom Lane
Florian Pflug  writes:
> Another idea for a fix would be to conflate lwWaiting and lwWaitLink into one
> field. We could replace "lwWaiting" by "lwWaitLink != NULL" everywhere it's
> tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
> enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to true.

> We'd then depend on pointer-sized stores being atomic, which I think we depend
> on in other places already.

I don't believe that's true; neither that we depend on it now, nor that
it would be safe to do so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Tom Lane
Stephen Frost  writes:
> Having @include and directory.d-style capabilities for pg_hba.conf *and*
> pg_ident.conf would make managing larger environments much better.

I'm a little suspicious of this, mainly because pg_hba searching is
necessarily linear (and none too cheap per-entry).  I think anyone
who tries to use a set of entries large enough to really need multiple
files is going to have pain.

We already have various methods for making one pg_hba entry do the
work of many; for instance, IP-subnet entries, wildcards, and role
references.  And you can use database CONNECT privilege grants as
another substitute for fine-grained pg_hba entries.

I'd be interested to see a real use-case where those things aren't
an adequate substitute for a pg_hba rule set that's too large to
fit conveniently in one file.  Maybe we could identify another
pg_hba abstraction technique we need to support.

In short: I suspect this approach may be fixing the wrong thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Fabrízio de Royes Mello
On Fri, Feb 14, 2014 at 12:36 PM, Stephen Frost  wrote:
>
> * Bruce Momjian (br...@momjian.us) wrote:
> > In an ideal world we would have a tool where you could plug in a
> > username, database, IP address, and test pg_hba.conf file and it would
> > report what line is matched.
>
> That's not a bad idea, but we don't expose the logic that figures that
> out today..  It would, perhaps, not be horrible to duplicate it, but
> then we'd need to make sure that we update both places if it ever
> changes (not that it's changed much in oh-so-many-years).  Perhaps
> another candidate to be a GSoC project.
>

I would like to participate the GSoC this year as a student.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Bruce Momjian
On Fri, Feb 14, 2014 at 09:34:59AM -0500, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
> > > Having @include and directory.d-style capabilities for pg_hba.conf *and*
> > > pg_ident.conf would make managing larger environments much better.
> > > There has been some talk about providing those capabilities via tables
> > > in the catalog, but I'm not aware of anyone working on it and it'd
> > > certainly be quite a bit more work than adding include/dir.d options.
> > 
> > Do we want a TODO for this?
> 
> Sure, and it should be a candidate for GSoC, imv...

OK, added.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> In an ideal world we would have a tool where you could plug in a
> username, database, IP address, and test pg_hba.conf file and it would
> report what line is matched.

That's not a bad idea, but we don't expose the logic that figures that
out today..  It would, perhaps, not be horrible to duplicate it, but
then we'd need to make sure that we update both places if it ever
changes (not that it's changed much in oh-so-many-years).  Perhaps
another candidate to be a GSoC project.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Magnus Hagander
On Fri, Feb 14, 2014 at 3:32 PM, Bruce Momjian  wrote:

> On Thu, Feb 13, 2014 at 11:28:45PM -0600, Jerry Sievers wrote:
> > > One issue with this is that pg_hba.conf is order sensitive, which could
> > > become a trap for the unwary if includes are used carelessly.
> >
> > Indeed.
> >
> > The other thing that comes to mind, is that as opposed to
> > postgresql.conf and the include scenario there... one can do show all or
> > query from pg_stat_activity just to see what setting they ended up
> > with.
> >
> > I'm not aware of any way to probe what hba rules are loaded at runtime
> > and thus, debugging hba config changes not really possible.
>
> In an ideal world we would have a tool where you could plug in a
> username, database, IP address, and test pg_hba.conf file and it would
> report what line is matched.
>

I almost wrote a function you could call to do that a while back. I never
finished it though :)

It's not all that hard to do, but requires some minor refactoring of how
the HBA code works.

What would also be useful would be to be able to use such a function/tool
against a different file than the current HBA one, to verify *before* you
reload...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
> > Having @include and directory.d-style capabilities for pg_hba.conf *and*
> > pg_ident.conf would make managing larger environments much better.
> > There has been some talk about providing those capabilities via tables
> > in the catalog, but I'm not aware of anyone working on it and it'd
> > certainly be quite a bit more work than adding include/dir.d options.
> 
> Do we want a TODO for this?

Sure, and it should be a candidate for GSoC, imv...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Bruce Momjian
On Fri, Feb 14, 2014 at 03:28:23AM -0500, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
> > > I'm aware of how a pg_hba.conf file can refer to other files for
> > > including @lists of users, etc.
> > > 
> > > But there is currently no support for being able to pull in entire file
> > > segments as can be done for postgresql.conf via the include directive.
> > > 
> > > In the environment that I'm managing, we are using a makefile to stick
> > > together a common header with a custom section for any of several
> > > clusters and may extend this further to permit additional includes for
> > > hba rules common to groupings of clusters.
> > > 
> > > Anyway, please advise.  I don't recall hearing anything like this
> > > discussed.
> > > 
> > > Has been proposed, discussed and voted down?  Or never mentioned?
> > 
> > I have never heard of anyone request this.
> 
> I've brought it up on various threads before, including both the ALTER
> SYSTEM thread and the postgresql.conf 'includes' thread, though I don't
> feel like going back and hunting down the specific emails right now.
> 
> Having @include and directory.d-style capabilities for pg_hba.conf *and*
> pg_ident.conf would make managing larger environments much better.
> There has been some talk about providing those capabilities via tables
> in the catalog, but I'm not aware of anyone working on it and it'd
> certainly be quite a bit more work than adding include/dir.d options.

Do we want a TODO for this?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Bruce Momjian
On Thu, Feb 13, 2014 at 11:28:45PM -0600, Jerry Sievers wrote:
> > One issue with this is that pg_hba.conf is order sensitive, which could
> > become a trap for the unwary if includes are used carelessly.
> 
> Indeed.
> 
> The other thing that comes to mind, is that as opposed to
> postgresql.conf and the include scenario there... one can do show all or
> query from pg_stat_activity just to see what setting they ended up
> with. 
> 
> I'm not aware of any way to probe what hba rules are loaded at runtime
> and thus, debugging hba config changes not really possible.

In an ideal world we would have a tool where you could plug in a
username, database, IP address, and test pg_hba.conf file and it would
report what line is matched.

> I presume that a simple scenario involving just 1 level of includes not
> too difficult to grok but nested includes sure might be a foot gun
> unless there was a way to dump the resulting configs somehow.
> 
> Thus pasting hba files together externally a more reliable approach.

You certainly would not have a visual idea of what line is matched
_first_.  We have the same problem with postgresql.conf includes, though
the last match wins there --- not sure if that makes it any easier.

I think one concern is that pg_hba.conf is more security-oriented than
postgresql.conf.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
Let me rephrase this:

Let’s remove my motivations and use cases from this conversation….

Why is that a bad idea of rewriting the query before it reaches 
transform/analyze (without ever accessing the catalog)?

If that flexibility is acceptable to you, where would be the best place to put 
it in?

Thanks, David


2014.02.14. dátummal, 10:30 időpontban David Beck  írta:

> Thanks for the reply. There are two things I think I’ve been misunderstood:
> 
> 1, the point is to do the rewrite without and before catalog access
> 2, I do want to push the join to the source and equally important pushing the 
> where conditions there
> 
> Best regards, David
> 
> 
> 2014.02.13. dátummal, 21:22 időpontban Tom Lane  írta:
> 
>> David Beck  writes:
>>> I have table like data structures in the source system for the FDW I work 
>>> on.
>>> These tables are sometimes too big and the source system is able to filter 
>>> and join them with limitations, thus it is not optimal to transfer the data 
>>> to Postgres.
>>> At the same time I want the users to think in terms of the original tables.
>> 
>>> The idea is to rewrite the SQL queries like this:
>> 
>>> “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
>>> b.col2=987”
>> 
>>> to:
>> 
>>> “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
>> 
>> TBH this sounds like a spectacularly bad idea, especially in the place and
>> way you propose to do it.  You can't even do catalog access safely where
>> you've put that hook, not to mention that there are many other places
>> where queries can be submitted.  But more generally, an FDW should not
>> operate in the way you're describing.
>> 
>> We do lack support for pushing joins to the foreign server, and that needs
>> to be addressed; but we need to do it in the planner, not by kluging the
>> query somewhere upstream of that.
>> 
>>  regards, tom lane
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 14:07 , Andres Freund  wrote:
> On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
>>> I agree we should do that, but imo not in the backbranches. Anything
>>> more than than the minimal fix in that code should be avoided in the
>>> stable branches, this stuff is friggin performance sensitive, and the
>>> spinlock already is a *major* contention point in many workloads.
>> 
>> No argument there. But unless I missed something, there actually is a bug
>> there, and using volatile won't fix it. A barrier would, but what about the
>> back branches that don't have barrier.h?
> 
> Yea, but I don't see a better alternative. Realistically the likelihood
> of a problem without the compiler reordering stuff is miniscule on any
> relevant platform that's supported by the !barrier.h branches. Newer
> ARMs are the only realistic suspect, and the support for in older
> releases wasn't so good...

Isn't POWER/PowerPC potentially affected by this? 

Also, there is still the alternative of not resetting lwWaitLink in 
LWLockRelease.
I can understand why you oppose that - it's certainly nicer to not have stray
pointer lying around. But since (as least as far as we know)

a) resetting lwWaitLink is not strictly necessary
b) resetting lwWaitLink introduces a race condition (however unlikely)

we'll trade correctness for cleanliness if we continue to reset lwWaitLink
without protecting against the race. That's a bit of a weird trade-off to make.

Another idea for a fix would be to conflate lwWaiting and lwWaitLink into one
field. We could replace "lwWaiting" by "lwWaitLink != NULL" everywhere it's
tested, and set lwWaitLink to some special non-NULL value (say 0x1) when we
enqueue a PGPROC, instead of setting it to NULL and setting lwWaiting to true.

We'd then depend on pointer-sized stores being atomic, which I think we depend
on in other places already.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Andres Freund
On 2014-02-14 13:58:59 +0100, Andres Freund wrote:
> On 2014-02-14 12:55:06 +, Greg Stark wrote:
> > On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund  
> > wrote:
> > > There's no reason not
> > > to ask for a ping when we're writing.
> 
> > Is there a reason to ask for a ping? The point of keepalives is to
> > ensure there's some traffic on idle connections so that if the
> > connection is dead it doesn't linger forever and so that any on-demand
> > links (or more recently NAT routers or stateful firewalls) don't time
> > out and disconnect and have to reconnect (or more recently just fail
> > outright).
> 
> This ain't TCP keepalives. The reason is that we want to kill walsenders
> if they haven't responded to a ping inside wal_sender_timeout. That's
> rather important e.g. for sychronous replication, so we can quickly fall
> over to the next standby. In such scenarios you'll usually want a
> timeout *far* below anything TCP provides.

walreceiver sends pings everytime it receives a 'w' message, so it's
probably not an issue there, but pg_receivexlog/basebackup don't; they
use their own configured intervarl. So this might be an explanation of
the latter two being disconnected too early. I've seen reports of
that...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-14 Thread Heikki Linnakangas

On 02/14/2014 10:38 AM, Kyotaro HORIGUCHI wrote:

Finally, the patch you will find attached is fixed only in
styling mentioned above from your last patch. This patch applies
current HEAD and I confirmed that it fixes this issue but I have
not checked the lastSourceFailed section. Simple file removal
could not lead to there.


Ok, applied. Thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-14 13:52:45 +0100, Florian Pflug wrote:
> > I agree we should do that, but imo not in the backbranches. Anything
> > more than than the minimal fix in that code should be avoided in the
> > stable branches, this stuff is friggin performance sensitive, and the
> > spinlock already is a *major* contention point in many workloads.
> 
> No argument there. But unless I missed something, there actually is a bug
> there, and using volatile won't fix it. A barrier would, but what about the
> back branches that don't have barrier.h?

Yea, but I don't see a better alternative. Realistically the likelihood
of a problem without the compiler reordering stuff is miniscule on any
relevant platform that's supported by the !barrier.h branches. Newer
ARMs are the only realistic suspect, and the support for in older
releases wasn't so good...

> The former
> leaves me with a bit of an uneasy feeling, and the latter quite certainly has
> a larger performance impact than moving the PGPROC updates within the section
> protected by the spinlock and using an array to remember the backends to 
> wakeup.

I am not so sure, it adds a host of new cacheline references in a piece
of code that's already heavily affected by pipeline stalls, that can
influence performance. I am not saying it's super likely, just more than
I want to do for a theoretical problem in the back branches.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-14 Thread Alvaro Herrera
Andres Freund wrote:
> On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:

> > The build in mechanism, that cleanup is i cost paid by the process who
> > happened to fill the pendinglist, is really hard to deal with in
> > production. More control is appreciated, perhaps even an explicit
> > flush-mechanism..  I'd like to batch up inserts during one transaction
> > only and flush on commit.
> 
> That doesn't seem likely to work with a reasonable amount of effort. The
> fastupdate list is shared across all processes, so one backend will
> always pay the price for several others.

Unless some other process does it, such as autovacuum.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Andres Freund
On 2014-02-14 12:55:06 +, Greg Stark wrote:
> On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund  
> wrote:
> > There's no reason not
> > to ask for a ping when we're writing.

> Is there a reason to ask for a ping? The point of keepalives is to
> ensure there's some traffic on idle connections so that if the
> connection is dead it doesn't linger forever and so that any on-demand
> links (or more recently NAT routers or stateful firewalls) don't time
> out and disconnect and have to reconnect (or more recently just fail
> outright).

This ain't TCP keepalives. The reason is that we want to kill walsenders
if they haven't responded to a ping inside wal_sender_timeout. That's
rather important e.g. for sychronous replication, so we can quickly fall
over to the next standby. In such scenarios you'll usually want a
timeout *far* below anything TCP provides.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Alvaro Herrera
Jerry Sievers wrote:

> The other thing that comes to mind, is that as opposed to
> postgresql.conf and the include scenario there... one can do show all or
> query from pg_stat_activity just to see what setting they ended up
> with. 
> 
> I'm not aware of any way to probe what hba rules are loaded at runtime
> and thus, debugging hba config changes not really possible.

Well, getting includes in postgresql.conf in an acceptable shape
required lots more work than just implementing the "include" directive;
one of the very first requirements was that the file path and line
number that each setting got its value from was available in
pg_settings.  I'm pretty sure that a patch that only added includes
without equivalent functionality would not get very far.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 12:05 PM, Andres Freund  wrote:
> There's no reason not
> to ask for a ping when we're writing.


Is there a reason to ask for a ping? The point of keepalives is to
ensure there's some traffic on idle connections so that if the
connection is dead it doesn't linger forever and so that any on-demand
links (or more recently NAT routers or stateful firewalls) don't time
out and disconnect and have to reconnect (or more recently just fail
outright).

By analogy TCP doesn't send any keepalives if there is any traffic on
the link. On the other hand I happen to know (the hard way) that
typical PPPoE routers *do* send LCP pings even when the link is busy
so there's precedent for either behaviour. I'm guess it comes down to
why you want the keepalives.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 13:36 , Andres Freund  wrote:
> On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
>>> I don't think that can actually happen because the head of the wait list
>>> isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
>>> for a while...
>> 
>> Hm, true, but does that protect us under all circumstances? If another
>> backend grabs the lock before B gets a chance to do so, B will become the
>> wait queue's head, and anyone who enqueues behind B will do so by updating
>> B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
>> by the original lock holder.
>> 
>> The specific sequence I have in mind is:
>> 
>> A: Take lock
>> B: Enqueue
>> A: Release lock, set B's lwWaiting to false
>> D: Acquire lock
>> B: Enqueue after spurious wakeup
>>   (lock->head := B)
>> C: Enqueue behind B
>>   (B->lwWaitLink := C, lock->tail := C)
>> A: Set B's wWaitLink back to NULL, thereby corrupting the queue
>>   (B->lwWaitLink := NULL)
>> D: Release lock, dequeue and wakeup B
>>   (lock->head := B->lwWaitLink, i.e. lock->head := NULL)
>> B: Take and release lock, queue appears empty!
>>   C blocks indefinitely.
> 
> That's assuming either reordering by the compiler or an out-of-order
> store architecture, right?

Yes, it requires that a backend exits out of the PGSemaphoreLock loop in
LWLockAcquire before lwWaitLink has been reset to NULL by the previous lock
holder's LWLockRelease. Only if that happens there is a risk of the PGPROC
being on a wait queue by the time lwWaitLink is finally reset to NULL.

 I wonder whether LWLockRelease really needs to update lwWaitLink at all.
 We take the backends we awake off the queue by updating the queue's head 
 and
 tail, so the contents of lwWaitLink should only matter once the backend is
 re-inserted into some wait queue. But when doing that, we reset lwWaitLink
 back to NULL anway.
>>> 
>>> I don't like that, this stuff is hard to debug already, having stale
>>> pointers around doesn't help. I think we should just add the barrier in
>>> the releases with barrier.h and locally use a volatile var in the
>>> branches before that.
>> 
>> I like the idea of updating lwWaiting and lwWaitLink while still holding the
>> spinlock better. The costs are probably negligible, and it'd make the code 
>> much
>> easier to reason about.
> 
> I agree we should do that, but imo not in the backbranches. Anything
> more than than the minimal fix in that code should be avoided in the
> stable branches, this stuff is friggin performance sensitive, and the
> spinlock already is a *major* contention point in many workloads.

No argument there. But unless I missed something, there actually is a bug
there, and using volatile won't fix it. A barrier would, but what about the
back branches that don't have barrier.h? AFAICS the only other choices we have 
on
these branches are to either ignore the bug - it's probably *extremely* unlikely
- or to use a spinlock acquire/release cycle to create a barrier. The former
leaves me with a bit of an uneasy feeling, and the latter quite certainly has
a larger performance impact than moving the PGPROC updates within the section
protected by the spinlock and using an array to remember the backends to wakeup.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] issue with gininsert under very high load

2014-02-14 Thread Andres Freund
On 2014-02-14 08:06:40 +0100, Jesper Krogh wrote:
> On 14/02/14 00:49, Tom Lane wrote:
> >Andres Freund  writes:
> >>On 2014-02-13 16:15:42 -0500, Tom Lane wrote:
> >>>Something like the attached?  Can somebody who's seen this problem confirm
> >>>this improves matters?
> >>Hm. Won't that possiby lead to the fast tuple list growing unboundedly?
> >>I think we would need to at least need to stop using the fast tuple
> >>mechanism during gininsert() if it's already too big and do plain
> >>inserts.
> >No, because we've already got a process working on cleaning it out.
> >
> >In any case, this needs some testing to see if it's an improvement
> >or not.
> 
> Having some real-world experience with the fastupdate mechanism. Under
> concurrent load
> it behaves really bad. Random processes waiting for cleanup (or competing
> with cleanup) is
> going to see latency-spikes, because they magically hit that corner, thus
> reverting to plain
> inserts if it cannot add to the pending list, will not remove this problem,
> but will
> make it only hit the process actually doing the cleanup.

Yea, this is only a part of fixing fastupdate. Limiting the size of the
fastupdate list to something more reasonable is pretty important as
well. Not competing around cleanup will make cleanup much faster though,
so I am not that concerned about the latency spikes it causes once it's
limited in size and done non-concurrently.

> The build in mechanism, that cleanup is i cost paid by the process who
> happened to fill the pendinglist, is really hard to deal with in
> production. More control is appreciated, perhaps even an explicit
> flush-mechanism..  I'd like to batch up inserts during one transaction
> only and flush on commit.

That doesn't seem likely to work with a reasonable amount of effort. The
fastupdate list is shared across all processes, so one backend will
always pay the price for several others.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-14 13:28:47 +0100, Florian Pflug wrote:
> > I don't think that can actually happen because the head of the wait list
> > isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
> > for a while...
> 
> Hm, true, but does that protect us under all circumstances? If another
> backend grabs the lock before B gets a chance to do so, B will become the
> wait queue's head, and anyone who enqueues behind B will do so by updating
> B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
> by the original lock holder.
> 
> The specific sequence I have in mind is:
> 
> A: Take lock
> B: Enqueue
> A: Release lock, set B's lwWaiting to false
> D: Acquire lock
> B: Enqueue after spurious wakeup
>(lock->head := B)
> C: Enqueue behind B
>(B->lwWaitLink := C, lock->tail := C)
> A: Set B's wWaitLink back to NULL, thereby corrupting the queue
>(B->lwWaitLink := NULL)
> D: Release lock, dequeue and wakeup B
>(lock->head := B->lwWaitLink, i.e. lock->head := NULL)
> B: Take and release lock, queue appears empty!
>C blocks indefinitely.

That's assuming either reordering by the compiler or an out-of-order
store architecture, right?

> >> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
> >> We take the backends we awake off the queue by updating the queue's head 
> >> and
> >> tail, so the contents of lwWaitLink should only matter once the backend is
> >> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
> >> back to NULL anway.
> > 
> > I don't like that, this stuff is hard to debug already, having stale
> > pointers around doesn't help. I think we should just add the barrier in
> > the releases with barrier.h and locally use a volatile var in the
> > branches before that.
> 
> I like the idea of updating lwWaiting and lwWaitLink while still holding the
> spinlock better. The costs are probably negligible, and it'd make the code 
> much
> easier to reason about.

I agree we should do that, but imo not in the backbranches. Anything
more than than the minimal fix in that code should be avoided in the
stable branches, this stuff is friggin performance sensitive, and the
spinlock already is a *major* contention point in many workloads.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Florian Pflug
On Feb14, 2014, at 11:45 , Andres Freund  wrote:
> On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
>> On Feb10, 2014, at 17:38 , Andres Freund  wrote:
>>> On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
 Andres Freund  writes:
> So what we need to do is to acquire a write barrier between the
> assignments to lwWaitLink and lwWaiting, i.e.
>   proc->lwWaitLink = NULL;
>   pg_write_barrier();
>   proc->lwWaiting = false;
 
 You didn't really explain why you think that ordering is necessary?
 Each proc being awoken will surely see both fields updated, and other
 procs won't be examining these fields at all, since we already delinked
 all these procs from the LWLock's queue.
>>> 
>>> The problem is that one the released backends could wake up concurrently
>>> because of a unrelated, or previous PGSemaphoreUnlock(). It could see
>>> lwWaiting = false, and thus wakeup and acquire the lock, even if the
>>> store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
>>> ordering here) yet.
>>> Now, it may well be that there's no practical consequence of that, but I
>>> am not prepared to bet on it.
>> 
>> AFAICS there is a potential problem if three backends are involved, since
>> by the time the waiting backend's lwWaitLink is set to NULL after the
>> original lock holder released the lock, the waiting backend might already
>> have acquired the lock (due to a spurious wakeup) *and* a third backend
>> might have already enqueued behind it.
>> 
>> The exact sequence for backends A,B,C that corrupts the wait queue is:
>> 
>> A: Release lock, set B's lwWaiting to false
>> B: Wakes up spuriously, takes the lock
>> C: Enqueues behind B
>> A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
>>   causing C and anyone behind it to block indefinitely.
> 
> I don't think that can actually happen because the head of the wait list
> isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
> for a while...

Hm, true, but does that protect us under all circumstances? If another
backend grabs the lock before B gets a chance to do so, B will become the
wait queue's head, and anyone who enqueues behind B will do so by updating
B's lwWaitLink. That is then undone by the delayed reset of B's lwWaitLink
by the original lock holder.

The specific sequence I have in mind is:

A: Take lock
B: Enqueue
A: Release lock, set B's lwWaiting to false
D: Acquire lock
B: Enqueue after spurious wakeup
   (lock->head := B)
C: Enqueue behind B
   (B->lwWaitLink := C, lock->tail := C)
A: Set B's wWaitLink back to NULL, thereby corrupting the queue
   (B->lwWaitLink := NULL)
D: Release lock, dequeue and wakeup B
   (lock->head := B->lwWaitLink, i.e. lock->head := NULL)
B: Take and release lock, queue appears empty!
   C blocks indefinitely.

>> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
>> We take the backends we awake off the queue by updating the queue's head and
>> tail, so the contents of lwWaitLink should only matter once the backend is
>> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
>> back to NULL anway.
> 
> I don't like that, this stuff is hard to debug already, having stale
> pointers around doesn't help. I think we should just add the barrier in
> the releases with barrier.h and locally use a volatile var in the
> branches before that.

I like the idea of updating lwWaiting and lwWaitLink while still holding the
spinlock better. The costs are probably negligible, and it'd make the code much
easier to reason about.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] walsender doesn't send keepalives when writes are pending

2014-02-14 Thread Andres Freund
Hi,

In WalSndLoop() we do:

wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
WL_SOCKET_READABLE;

if (pq_is_send_pending())
wakeEvents |= WL_SOCKET_WRITEABLE;
else if (wal_sender_timeout > 0 && !ping_sent)
{
...
if (GetCurrentTimestamp() >= timeout)
WalSndKeepalive(true);
...

I think those two if's should simply be separate. There's no reason not
to ask for a ping when we're writing. On a busy server that might be the
case most of the time.
The if (pq_is_send_pending()) should also be *after* sending the
keepalive, after all, it might not immediately have been flushed as
unlikely as that is.

The attached patch is unsurprisingly simple.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 06b22e2..41db03d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1270,9 +1270,7 @@ WalSndLoop(void)
 			wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
 WL_SOCKET_READABLE;
 
-			if (pq_is_send_pending())
-wakeEvents |= WL_SOCKET_WRITEABLE;
-			else if (wal_sender_timeout > 0 && !ping_sent)
+			if (wal_sender_timeout > 0 && !ping_sent)
 			{
 /*
  * If half of wal_sender_timeout has lapsed without receiving
@@ -1291,6 +1289,9 @@ WalSndLoop(void)
 }
 			}
 
+			if (pq_is_send_pending())
+wakeEvents |= WL_SOCKET_WRITEABLE;
+
 			/* Determine time until replication timeout */
 			if (wal_sender_timeout > 0)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-02-14 Thread Mitsumasa KONDO
I add exponential distribution random generator (and little bit
refactoring:) ).
I use inverse transform method to create its distribution.  It's very
simple method that is
created by - log (rand()). We can control slope of distribution using
threshold parameter.
It is same as gaussian threshold.

usage example
  pgbench --exponential=NUM -S

Attached graph is created with exponential threshold = 5. We can see
exponential
distribution in the graphs. It supports -S, -N options and custom script.
So we set
"¥setexponential [var] [min] [max] [threshold]" in a transaction pattern
file,
it appear distribution we want.

We have no time to fix its very much... But I think almost part of patch
have been completed.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


gaussian_and_exponential_pgbench_v6.patch
Description: Binary data
<>

gnuplot.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] walsender can ignore send failures in WalSndLoop

2014-02-14 Thread Andres Freund
Hi,

There's a small issue in abfd192b, namely one of the error cases wasn't
changed when WalSndLoop was changed to be able to return.

I don't think this is likely to have any grave consequences, we'll
likely error out soon afterwards again.

Patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 06b22e2..8750dd2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1287,7 +1287,7 @@ WalSndLoop(void)
 	ping_sent = true;
 	/* Try to flush pending output to the client */
 	if (pq_flush_if_writable() != 0)
-		break;
+		goto send_failure;
 }
 			}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease

2014-02-14 Thread Andres Freund
On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
> On Feb10, 2014, at 17:38 , Andres Freund  wrote:
> > On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
> >> Andres Freund  writes:
> >>> So what we need to do is to acquire a write barrier between the
> >>> assignments to lwWaitLink and lwWaiting, i.e.
> >>>proc->lwWaitLink = NULL;
> >>>pg_write_barrier();
> >>>proc->lwWaiting = false;
> >> 
> >> You didn't really explain why you think that ordering is necessary?
> >> Each proc being awoken will surely see both fields updated, and other
> >> procs won't be examining these fields at all, since we already delinked
> >> all these procs from the LWLock's queue.
> > 
> > The problem is that one the released backends could wake up concurrently
> > because of a unrelated, or previous PGSemaphoreUnlock(). It could see
> > lwWaiting = false, and thus wakeup and acquire the lock, even if the
> > store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
> > ordering here) yet.
> > Now, it may well be that there's no practical consequence of that, but I
> > am not prepared to bet on it.
> 
> AFAICS there is a potential problem if three backends are involved, since
> by the time the waiting backend's lwWaitLink is set to NULL after the
> original lock holder released the lock, the waiting backend might already
> have acquired the lock (due to a spurious wakeup) *and* a third backend
> might have already enqueued behind it.
> 
> The exact sequence for backends A,B,C that corrupts the wait queue is:
> 
> A: Release lock, set B's lwWaiting to false
> B: Wakes up spuriously, takes the lock
> C: Enqueues behind B
> A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
>causing C and anyone behind it to block indefinitely.

I don't think that can actually happen because the head of the wait list
isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
for a while...

So, right now I don't see problems without either the compiler reordering
stores or heavily out of order machines with speculative execution.

> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
> We take the backends we awake off the queue by updating the queue's head and
> tail, so the contents of lwWaitLink should only matter once the backend is
> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
> back to NULL anway.

I don't like that, this stuff is hard to debug already, having stale
pointers around doesn't help. I think we should just add the barrier in
the releases with barrier.h and locally use a volatile var in the
branches before that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6

2014-02-14 Thread Erik Rijkers
On Fri, February 14, 2014 10:13, Andres Freund wrote:
> Hi,
>
> On 2014-02-14 09:23:45 +0100, Erik Rijkers wrote:
>> >0001-wal_decoding-Introduce-logical-changeset-extraction.patch.gz   159 k
>> >0002-wal_decoding-logical-changeset-extraction-walsender-.patch.gz  16 k
>> >0003-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz  15 k
>> >0004-wal_decoding-Documentation-for-replication-slots-and.patch.gz  14 k
>> >0005-wal_decoding-Temporarily-add-logical-decoding-regres.patch.gz  1.8 k
>>
>> These don't apply...
>
> Works here, could you give a bit more details about the problem you have
> applying them? Note that they are compressed, so need to be gunzipped first...
>

yeah, unzipping -- I thought of that :)  (no offense taken :))

I just gave it into my standard patch-applying-compiling shell script which 
does something like, as always :

patch --dry-run -b -l -F 25 -p 1 <
/home/aardvark/download/pgpatches/0094/lcsg/20140213/0001-wal_decoding-Introduce-logical-changeset-extraction.patch
>patch.1.dry-run.1.out

(it loops to try out several -p values, none acceptable)

etc




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
Thanks for the reply. There are two things I think I’ve been misunderstood:

1, the point is to do the rewrite without and before catalog access
2, I do want to push the join to the source and equally important pushing the 
where conditions there

Best regards, David


2014.02.13. dátummal, 21:22 időpontban Tom Lane  írta:

> David Beck  writes:
>> I have table like data structures in the source system for the FDW I work on.
>> These tables are sometimes too big and the source system is able to filter 
>> and join them with limitations, thus it is not optimal to transfer the data 
>> to Postgres.
>> At the same time I want the users to think in terms of the original tables.
> 
>> The idea is to rewrite the SQL queries like this:
> 
>>  “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
>> b.col2=987”
> 
>> to:
> 
>>  “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
> 
> TBH this sounds like a spectacularly bad idea, especially in the place and
> way you propose to do it.  You can't even do catalog access safely where
> you've put that hook, not to mention that there are many other places
> where queries can be submitted.  But more generally, an FDW should not
> operate in the way you're describing.
> 
> We do lack support for pushing joins to the foreign server, and that needs
> to be addressed; but we need to do it in the planner, not by kluging the
> query somewhere upstream of that.
> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6

2014-02-14 Thread Andres Freund
Hi,

On 2014-02-14 09:23:45 +0100, Erik Rijkers wrote:
> >0001-wal_decoding-Introduce-logical-changeset-extraction.patch.gz159 k
> >0002-wal_decoding-logical-changeset-extraction-walsender-.patch.gz   16 k
> >0003-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz   15 k
> >0004-wal_decoding-Documentation-for-replication-slots-and.patch.gz   14 k
> >0005-wal_decoding-Temporarily-add-logical-decoding-regres.patch.gz   1.8 k
> 
> These don't apply...

Works here, could you give a bit more details about the problem you have
applying them? Note that they are compressed, so need to be gunzipped first...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-14 Thread Hiroshi Inoue
(2014/02/12 8:30), Tom Lane wrote:
> I wrote:
>> Hiroshi Inoue  writes:
>>> I tried MINGW port with the attached change and successfully built
>>> src and contrib and all pararell regression tests were OK.
> 
>> I cleaned this up a bit (the if-nesting in Makefile.shlib was making
>> my head hurt, not to mention that it left a bunch of dead code) and
>> committed it.
> 
> Hm ... according to buildfarm member narwhal, this doesn't work so well
> for plperl:
> 
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
> -Wformat-security -fno-strict-aliasing -fwrapv -g -Wno-comment   -shared -o 
> plperl.dll  plperl.o SPI.o Util.o -L../../../src/port -L../../../src/common 
> -Wl,--allow-multiple-definition -L/mingw/lib  -Wl,--as-needed   
> -LC:/Perl/lib/CORE -lperl58 -L../../../src/backend -lpostgres -lpgcommon 
> -lpgport -lintl -lxslt -lxml2 -lssleay32 -leay32 -lz -lm  -lws2_32 -lshfolder 
> -Wl,--export-all-symbols -Wl,--out-implib=libplperl.a
> Cannot export .idata$4: symbol not found
> Cannot export .idata$5: symbol not found
> Cannot export .idata$6: symbol not found
> Cannot export .text: symbol not found
> Cannot export perl58_NULL_THUNK_DATA: symbol not found
> Creating library file: libplperl.a
> collect2: ld returned 1 exit status
> make[3]: *** [plperl.dll] Error 1
> 
> Not very clear what's going on there; could this be a problem in
> narwhal's admittedly-ancient toolchain?

The error doesn't occur here (un)fortunately.
One thing I'm wondering about is that plperl is linking perlxx.lib
not libperlxx.a. I made a patch following plpython and it also
works here.
Is it worth trying?

regards,
Hiroshi Inoue

diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index e0e31ec..065c5d3 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -16,6 +16,8 @@ endif
 ifeq ($(shared_libperl),yes)
 
 ifeq ($(PORTNAME), win32)
+generate_perl_implib = yes
+
 override CPPFLAGS += -DPLPERL_HAVE_UID_GID
 # Perl on win32 contains /* within comment all over the header file,
 # so disable this warning.
@@ -36,7 +38,21 @@ DATA = plperl.control plperl--1.0.sql plperl--unpackaged--1.0.sql \
 
 PERLCHUNKS = plc_perlboot.pl plc_trusted.pl
 
+# Perl on win32 ships with import libraries only for Microsoft Visual C++,
+# which may not be handled with mingw gcc. Therefore we choose to build a
+# new import library to link with.
+ifeq ($(generate_perl_implib), yes)
+	perlwithver = $(subst -l,,$(filter -l%, $(perl_embed_ldflags)))
+	OBJS += lib$(perlwithver).a
+lib$(perlwithver).a: $(perlwithver).def
+	dlltool --dllname $(perlwithver).dll --def $(perlwithver).def --output-lib  lib$(perlwithver).a
+$(perlwithver).def:
+	pexports $(dir $(subst ',,$(PERL)))$(perlwithver).dll > $(perlwithver).def
+
+SHLIB_LINK = 
+else
 SHLIB_LINK = $(perl_embed_ldflags)
+endif
 
 REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=plperl  --load-extension=plperlu
 REGRESS = plperl plperl_lc plperl_trigger plperl_shared plperl_elog plperl_util plperl_init plperlu plperl_array
@@ -105,6 +121,9 @@ submake:
 clean distclean maintainer-clean: clean-lib
 	rm -f SPI.c Util.c $(OBJS) perlchunks.h plperl_opmask.h
 	rm -rf $(pg_regress_clean_files)
+ifeq (yes, $(generate_perl_implib))
+	rm -rf	$(perlwithver).def
+endif
 
 else # can't build
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.

2014-02-14 Thread Kyotaro HORIGUCHI
Hello,

Before taking up the topic..

At Thu, 13 Feb 2014 19:45:38 +0200, Heikki Linnakangas wrote
> On 02/13/2014 06:47 PM, Heikki Linnakangas wrote:
> > On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
> >> The behavior where we prefer a segment from archive with lower TLI
> >> over
> >> a file with higher TLI in pg_xlog actually changed in commit
> >> a068c391ab0. Arguably changing it wasn't a good idea, but the problem
> >> your test script demonstrates can be fixed by not archiving the
> >> partial
> >> segment, with no change to the preference of archive/pg_xlog. As
> >> discussed, archiving a partial segment seems like a bad idea anyway,
> >> so
> >> let's just stop doing that.

It surely makes things simple and I rather like the idea but as
long as the final and possiblly partial segment of the lower TLI
is actually created and the recovery mechanism allows users to
command recovery operation requires such segments
(recovery_target_timeline does this), a "perfect archive" - which
means an archive which can cover all sorts of restore operatoins
- necessarily may have such duplicate segments, I
believe. Besides, I suppose that policy makes operations around
archive/restore way difficult. DBAs should get stuck with tensive
work of picking only actually needed segments for the recovery
undertaken out of the haystack. It sounds somewhat gloomy..

# However I also doubt the appropriateness of stockpiling archive
# segments spanning over so many timelines, two generations are
# enough to cause this issue.

Anyway, returning to the topic,

> > After some further thought, while not archiving the partial segment
> > fixes your test script, it's not enough to fix all variants of the
> > problem. Even if archive recovery doesn't archive the last, partial,
> > segment, if the original master server is still running, it's entirely
> > possible that it fills the segment and archives it. In that case,
> > archive recovery will again prefer the archived segment with lower TLI
> > over the segment with newer TLI in pg_xlog.

Yes, it is the generalized description of the case I've
mentioned. (Though I've not reached that thought :)

> > So I agree we should commit the patch you posted (or something to that
> > effect). The change to not archive the last segment still seems like a
> > good idea, but perhaps we should only do that in master.

My opinion on duplicate segments on older timelines is as
decribed above.

> To draw this to conclusion, barring any further insights to this, I'm
> going to commit the attached patch to master and REL9_3_STABLE. Please
> have a look at the patch, to see if I'm missing something. I modified
> the state machine to skip over XLOG_FROM_XLOG state, if reading in
> XLOG_FROM_ARCHIVE failed; otherwise you first scan the archive and
> pg_xlog together, and then pg_xlog alone, which is pointless.
> 
> In master, I'm also going to remove the "archive last segment on old
> timeline" code.

Thank you for finishing the patch. I didn't think of the behavior
after XLOG_FROM_ARCHIVE failure. It seems that the state machine
will go round getting rid of extra round with it. Recovery
process becomes able to grab the segment on highest (expected)
TLI among those with the same segment id regardless of their
locations. I think the recovery process will cope with "perfect"
archives described above for all types of recovery operation. The
state machine loop considering fallback from archive to pg_xlog
now seems somewhat too complicated than needed but it's also no
harm.

Though, here which was in my original patch,

>  readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
>currentSource == XLOG_FROM_ARCHIVE ? 
> XLOG_FROM_ANY : currentSource);

is sticking far out the line wrapping boundary and seems somewhat
dirty:(

And what the conditional operator seems to make the meaning of
the XLOG_FROM_ARCHIVE and _ANY a bit confused. But I failed to
unify them to any side so it is left as is..

Finally, the patch you will find attached is fixed only in
styling mentioned above from your last patch. This patch applies
current HEAD and I confirmed that it fixes this issue but I have
not checked the lastSourceFailed section. Simple file removal
could not lead to there.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 508970a..85a0ce9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11006,17 +11006,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	/*---
 	 * Standby mode is implemented by a state machine:
 	 *
-	 * 1. Read from archive (XLOG_FROM_ARCHIVE)
-	 * 2. Read from pg_xlog (XLOG_FROM_PG_XLOG)
-	 * 3. Check trigger file
-	 * 4. Read from primary server via walreceiver (XLOG_FROM_STREAM)
-	 * 5. Rescan timelines
-	 * 6. Sleep 5 seconds, and loop back to 1.
+	 * 1. Read from either archive or pg_xl

Re: [HACKERS] HBA files w/include support?

2014-02-14 Thread Stephen Frost
Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Feb 13, 2014 at 08:24:27PM -0600, Jerry Sievers wrote:
> > I'm aware of how a pg_hba.conf file can refer to other files for
> > including @lists of users, etc.
> > 
> > But there is currently no support for being able to pull in entire file
> > segments as can be done for postgresql.conf via the include directive.
> > 
> > In the environment that I'm managing, we are using a makefile to stick
> > together a common header with a custom section for any of several
> > clusters and may extend this further to permit additional includes for
> > hba rules common to groupings of clusters.
> > 
> > Anyway, please advise.  I don't recall hearing anything like this
> > discussed.
> > 
> > Has been proposed, discussed and voted down?  Or never mentioned?
> 
> I have never heard of anyone request this.

I've brought it up on various threads before, including both the ALTER
SYSTEM thread and the postgresql.conf 'includes' thread, though I don't
feel like going back and hunting down the specific emails right now.

Having @include and directory.d-style capabilities for pg_hba.conf *and*
pg_ident.conf would make managing larger environments much better.
There has been some talk about providing those capabilities via tables
in the catalog, but I'm not aware of anyone working on it and it'd
certainly be quite a bit more work than adding include/dir.d options.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Changeset Extraction v7.6

2014-02-14 Thread Erik Rijkers
On Thu, February 13, 2014 17:12, Andres Freund wrote:
> Hi,
>
> On 2014-02-11 11:22:24 -0500, Robert Haas wrote:
>> [loads of comments]
>
> I tried to address all the points you mentioned.
>

>0001-wal_decoding-Introduce-logical-changeset-extraction.patch.gz  159 k
>0002-wal_decoding-logical-changeset-extraction-walsender-.patch.gz 16 k
>0003-wal_decoding-pg_recvlogical-Introduce-pg_receivexlog.patch.gz 15 k
>0004-wal_decoding-Documentation-for-replication-slots-and.patch.gz 14 k
>0005-wal_decoding-Temporarily-add-logical-decoding-regres.patch.gz 1.8 k

These don't apply...








-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers