date:20130828

Re: [HACKERS] Improving avg performance for numeric

2013-08-28 Thread Pavel Stehule

2013/8/29 Hadi Moshayedi 

> Hello,
>
> > int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error
> ..
> > cleaner code
> > numeric sum 6490 ms (7224 ms) --  10% faster
> > numeric avg 6487 ms (12023 ms) -- 46% faster
>
> I also got very similar results.
>
> On the other hand, initially I was receiving sigsegv's whenever I
> wanted to try to run an aggregate function. gdb was telling that this
> was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
> find the reason, I had to reboot my computer at some point, after the
> reboot the sigsegv's went away. I want to look into this and find the
> reason, I think I have missed something here. Any thoughts about why
> this would happen?
>

I found a few bugs, that I fixed. There was a issue with empty sets. Other
issues I didn't find.

Regards

Pavel


>
> --Hadi
>

Re: [HACKERS] Improving avg performance for numeric

2013-08-28 Thread Hadi Moshayedi

Hello,

> int, float, double 26829 ms (26675 ms) -- 0.5% slower .. statistic error ..
> cleaner code
> numeric sum 6490 ms (7224 ms) --  10% faster
> numeric avg 6487 ms (12023 ms) -- 46% faster

I also got very similar results.

On the other hand, initially I was receiving sigsegv's whenever I
wanted to try to run an aggregate function. gdb was telling that this
was happening somewhere in nodeAgg.c at ExecInitAgg. While trying to
find the reason, I had to reboot my computer at some point, after the
reboot the sigsegv's went away. I want to look into this and find the
reason, I think I have missed something here. Any thoughts about why
this would happen?

--Hadi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Valgrind Memcheck support

2013-08-28 Thread Noah Misch

On Wed, Aug 28, 2013 at 03:16:14PM +0200, Andres Freund wrote:
> On 2013-08-27 23:46:23 -0400, Noah Misch wrote:
> > On Tue, Aug 27, 2013 at 04:14:27PM +0200, Andres Freund wrote:
> > > On 2013-06-09 17:25:59 -0400, Noah Misch wrote:
> > > > ***
> > > > *** 846,851  exec_simple_query(const char *query_string)
> > > > --- 847,856 
> > > >   
> > > > TRACE_POSTGRESQL_QUERY_START(query_string);
> > > >   
> > > > + #ifdef USE_VALGRIND
> > > > +   VALGRIND_PRINTF("statement: %s\n", query_string);
> > > > + #endif
> > > > + 
> > > 
> > > Is there a special reason for adding more logging here? I find this
> > > makes the instrumentation much less useful since reports easily get
> > > burried in those traces. What's the advantage of doing this instead of
> > > log_statement=...? Especially as that location afaics won't help for the
> > > extended protocol?

> > That being said, could you tell me more about your workflow where the extra
> > messages cause a problem?  Do you just typically diagnose each Valgrind 
> > error
> > without first isolating the pertinent SQL statement?
> 
> There's basically two scenarios where I found it annoying:
> a) I manually reproduce some issue, potentially involving several psql
> shells. All those fire up autocompletion and such queries which bloat
> the log while I actually only want to analyze something specific.
> b) I don't actually look for anything triggered by an SQL statement
> itself, but am looking for issues in a walsender, background worker,
> whatever. I still need some foreground activity to trigger the behaviour
> I want to see, but once more, valgrind's logs hide the error.
> 
> Also, I sometimes use valgrind's --db-command/--vgdb which gives you
> enough context from the backtrace itself since you can just get the
> statement from there.

Gotcha.  My largest use case was running "make installcheck" in search of
longstanding bugs.  The runs were unattended, and associating each Valgrind
error with its proximate command was a basic need.

> I vote for just removing that VALGRIND_PRINTF - it doesn't give you
> anything you cannot easily achieve otherwise...

I'd like to see a buildfarm member running "make installcheck" under Valgrind,
so I'd like the code to fit the needs thereof without patching beyond
pg_config_manual.h.  Perhaps having the buildfarm member do "valgrind postgres
--log-statement=all 2>combined-logfile" is good enough.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clarification on materialized view restriction needed

2013-08-28 Thread Robert Haas

On Wed, Aug 28, 2013 at 1:40 AM, Ashutosh Bapat
 wrote:
> I would be good, if this set gets documented, lest users will be confused.
> Can you point me to relevant sections of document? I can add this
> documentation.

I think it's your job to look at the documentation and determine where
this would best fit, not Noah's to go decide that for you.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL PERFORM with CTE

2013-08-28 Thread Hannu Krosing

On 08/28/2013 09:59 PM, Robert Haas wrote:
> On Tue, Aug 27, 2013 at 6:10 PM, Pavel Stehule  
> wrote:
>> what is magical?
>>
>> Stored procedures - we talk about this technology was a originally simple
>> script moved from client side to server side.
>>
>> so if I write on client side
>>
>> BEGIN;
>>   SELECT 1,2;
>>   SELECT 2;
>>   SELECT 3,4;
>> END;
>>
>> then I expect results
>>
>> 1,2
>> 2
>> 3,4
> The biggest problem with this idea is that people will do it by
> accident with unacceptable frequency.  During the decade or so I
> worked as a web programmer, I made this mistake a number of times, and
> judging by the comments on this thread, Josh Berkus has made it with
> some regularity as well.  If experienced PostgreSQL hackers who know
> the system inside and out make such mistakes with some regularity, I
> think we can anticipate that novices will make them even more often.
Usually yo test your queries fom psql prompt and then copy/paste
into your function.
As ignoring the results need no conscious effort at psql prompt, it
will always be a mild surprise that you have to jump through hoops
to do it in pl/pgsql.
And I can easily do this for example in pl/python - just do not assign
the result from plpy.execute() and they get ignored with no extra
effort whatsoever.


> ...
> Finally, I'd like to note that it's been longstanding frustration of
> mine that the PERFORM->SELECT transformation is leaky.  For example,
> consider:
>
> rhaas=# do $$begin perform amazingly_well(); end;$$;
> ERROR:  function amazingly_well() does not exist
> LINE 1: SELECT amazingly_well()
>^
> HINT:  No function matches the given name and argument types. You
> might need to add explicit type casts.
> QUERY:  SELECT amazingly_well()
> CONTEXT:  PL/pgSQL function inline_code_block line 1 at PERFORM
>
> Hmm, the user might say.  I didn't type the word SELECT anywhere, yet
> it shows up in the error message.  How confusing!  With a big enough
> hammer we could perhaps paper over this problem a bit more thoroughly,
> but since I've never liked the syntax to begin with, I advance this as
> another argument for killing it.
>
Totally agree.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL PERFORM with CTE

2013-08-28 Thread Merlin Moncure

On Wed, Aug 28, 2013 at 2:59 PM, Robert Haas  wrote:
> On Tue, Aug 27, 2013 at 6:10 PM, Pavel Stehule  
> wrote:
>> what is magical?
>>
>> Stored procedures - we talk about this technology was a originally simple
>> script moved from client side to server side.
>>
>> so if I write on client side
>>
>> BEGIN;
>>   SELECT 1,2;
>>   SELECT 2;
>>   SELECT 3,4;
>> END;
>>
>> then I expect results
>>
>> 1,2
>> 2
>> 3,4
>
> The biggest problem with this idea is that people will do it by
> accident with unacceptable frequency.  During the decade or so I
> worked as a web programmer, I made this mistake a number of times, and
> judging by the comments on this thread, Josh Berkus has made it with
> some regularity as well.  If experienced PostgreSQL hackers who know
> the system inside and out make such mistakes with some regularity, I
> think we can anticipate that novices will make them even more often.
>
> And, TBH, as others have said here, I find the requirement to use
> PERFORM rather than SELECT rather ridiculous.  The clash with CTEs has
> been there since we added CTEs, and I've hit it more than once.  Yeah,
> you can work around it, but it's annoying.  And why annoy people?  So
> +1 from me for de-requiring the use of PERFORM (though I think we
> should definitely continue to accept that syntax, for backward
> compatibility).
>
> At the end of the day, procedural languages in PostgreSQL are
> pluggable.  So if we someday have the ability to return extra result
> sets on the fly, and if Pavel doesn't like the syntax we choose to use
> in PL/pgsql, he can (and, given previous history, very possibly will!)
> publish his own PL with different syntax.  But I'm with the crowd that
> says that's not the right decision for PL/pgsql.
>
> Also, even if we did adopt Pavel's proposed meaning for "SELECT 1,2",
> we still have a problem to solve, which is what the user should write
> when they want to run a query and ignore the results.  The PERFORM
> solution was adequate at a time when all select queries started with
> SELECT, but now they can start with WITH or VALUES or TABLE as well,
> and while VALUES and TABLE may be ignorable, WITH certainly isn't.
> Requiring people to use silly workarounds like selecting into an
> otherwise-pointless dummy variable is not cool.  If we reserve the
> undecorated-SELECT syntax to mean something else, then we've got to
> come up with some other way of solving David's original problem, and I
> don't think there are going to be many elegant options.
>
> Finally, I'd like to note that it's been longstanding frustration of
> mine that the PERFORM->SELECT transformation is leaky.  For example,
> consider:
>
> rhaas=# do $$begin perform amazingly_well(); end;$$;
> ERROR:  function amazingly_well() does not exist
> LINE 1: SELECT amazingly_well()
>^
> HINT:  No function matches the given name and argument types. You
> might need to add explicit type casts.
> QUERY:  SELECT amazingly_well()
> CONTEXT:  PL/pgSQL function inline_code_block line 1 at PERFORM
>
> Hmm, the user might say.  I didn't type the word SELECT anywhere, yet
> it shows up in the error message.  How confusing!  With a big enough
> hammer we could perhaps paper over this problem a bit more thoroughly,
> but since I've never liked the syntax to begin with, I advance this as
> another argument for killing it.

Right.  Another pain point for me is that I frequently have to
'up-convert' functions from sql to pgsql (and sometimes the other way
too). The perform requirement turns that into a headache.  It looks
like we are mostly ok on Oracle compatibility too.

I'm a fan of David's 'YIELD' syntax concept as a line of analysis for
'mid procedure set returning' when we get there.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL PERFORM with CTE

2013-08-28 Thread Robert Haas

On Tue, Aug 27, 2013 at 6:10 PM, Pavel Stehule  wrote:
> what is magical?
>
> Stored procedures - we talk about this technology was a originally simple
> script moved from client side to server side.
>
> so if I write on client side
>
> BEGIN;
>   SELECT 1,2;
>   SELECT 2;
>   SELECT 3,4;
> END;
>
> then I expect results
>
> 1,2
> 2
> 3,4

The biggest problem with this idea is that people will do it by
accident with unacceptable frequency.  During the decade or so I
worked as a web programmer, I made this mistake a number of times, and
judging by the comments on this thread, Josh Berkus has made it with
some regularity as well.  If experienced PostgreSQL hackers who know
the system inside and out make such mistakes with some regularity, I
think we can anticipate that novices will make them even more often.

And, TBH, as others have said here, I find the requirement to use
PERFORM rather than SELECT rather ridiculous.  The clash with CTEs has
been there since we added CTEs, and I've hit it more than once.  Yeah,
you can work around it, but it's annoying.  And why annoy people?  So
+1 from me for de-requiring the use of PERFORM (though I think we
should definitely continue to accept that syntax, for backward
compatibility).

At the end of the day, procedural languages in PostgreSQL are
pluggable.  So if we someday have the ability to return extra result
sets on the fly, and if Pavel doesn't like the syntax we choose to use
in PL/pgsql, he can (and, given previous history, very possibly will!)
publish his own PL with different syntax.  But I'm with the crowd that
says that's not the right decision for PL/pgsql.

Also, even if we did adopt Pavel's proposed meaning for "SELECT 1,2",
we still have a problem to solve, which is what the user should write
when they want to run a query and ignore the results.  The PERFORM
solution was adequate at a time when all select queries started with
SELECT, but now they can start with WITH or VALUES or TABLE as well,
and while VALUES and TABLE may be ignorable, WITH certainly isn't.
Requiring people to use silly workarounds like selecting into an
otherwise-pointless dummy variable is not cool.  If we reserve the
undecorated-SELECT syntax to mean something else, then we've got to
come up with some other way of solving David's original problem, and I
don't think there are going to be many elegant options.

Finally, I'd like to note that it's been longstanding frustration of
mine that the PERFORM->SELECT transformation is leaky.  For example,
consider:

rhaas=# do $$begin perform amazingly_well(); end;$$;
ERROR:  function amazingly_well() does not exist
LINE 1: SELECT amazingly_well()
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.
QUERY:  SELECT amazingly_well()
CONTEXT:  PL/pgSQL function inline_code_block line 1 at PERFORM

Hmm, the user might say.  I didn't type the word SELECT anywhere, yet
it shows up in the error message.  How confusing!  With a big enough
hammer we could perhaps paper over this problem a bit more thoroughly,
but since I've never liked the syntax to begin with, I advance this as
another argument for killing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Alvaro Herrera

Stephen Frost escribió:

> There are counter-examples also; sysctl.d will replace what's already
> been set, so perhaps it simply depends on an individual's experience.
> For my part, I'd much prefer an error or warning saying "you've got
> duplicate definitions of X" than a last-wins approach, though perhaps
> that's just because I like things to be very explicit and clear.
> Allowing multiple definitions of the same parameter to be valid isn't
> 'clear' to me.

One of the elements that had to be in place for this whole thing of
multiple configuration files to be allowed was for pg_settings to
include precise information about, for each setting, where is its value
coming from.  I doubt any of the systems you mentioned have comparable
functionality.  If the admin has trouble figuring it out, he needs only
look into that view.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Bruce Momjian

On Wed, Aug 28, 2013 at 03:15:14PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > Agreed, but I think this is a much larger issue than ALTER SYSTEM SET.
> 
> Yeah, true.
> 
> > I think changing behavior to first-seen would only add to confusion. 
> > What we really need is a WARNING when a later postgresql.conf setting
> > overrides an earlier one, and ALTER SYSTEM SET's config file could
> > behave the same, so you would know right away when you were overriding
> > something in postgresql.conf that appeared earlier.
> 
> A warning would be nice.  If only everyone read the log files. :/

I would expect ALTER SYSTEM SET to return a WARNING in such cases too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory

2013-08-28 Thread Robert Haas

On Tue, Aug 27, 2013 at 10:07 AM, Andres Freund  wrote:
> [just sending an email which sat in my outbox for two weeks]

Thanks for taking a look.

> Nice to see this coming. I think it will actually be interesting for
> quite some things outside parallel query, but we'll see.

Yeah, I hope so.  The applications may be somewhat limited by the fact
that there are apparently fairly small limits to how many shared
memory segments you can map at the same time.  I believe on one system
I looked at (some version of HP-UX?) the limit was 11.  So we won't be
able to go nuts with this: using it definitely introduces all kinds of
failure modes that we don't have it today.  But it will also let us do
some pretty cool things that we CAN'T do today.

>> To help solve these problems, I invented something called the "dynamic
>> shared memory control segment".  This is a dynamic shared memory
>> segment created at startup (or reinitialization) time by the
>> postmaster before any user process are created.  It is used to store a
>> list of the identities of all the other dynamic shared memory segments
>> we have outstanding and the reference count of each.  If the
>> postmaster goes through a crash-and-reset cycle, it scans the control
>> segment and removes all the other segments mentioned there, and then
>> recreates the control segment itself.  If the postmaster is killed off
>> (e.g. kill -9) and restarted, it locates the old control segment and
>> proceeds similarly.
>
> That way any corruption in that area will prevent restarts without
> reboot unless you use ipcrm, or such, right?

The way I've designed it, no.  If what we expect to be the control
segment doesn't exist or doesn't conform to our expectations, we just
assume that it's not really the control segment after all - e.g.
someone rebooted, clearing all the segments, and then an unrelated
process (malicious, perhaps, or just a completely different cluster)
reused the same name.  This is similar to what we do for the main
shared memory segment.

>> Creating a shared memory segment is a somewhat operating-system
>> dependent task.  I decided that it would be smart to support several
>> different implementations and to let the user choose which one they'd
>> like to use via a new GUC, dynamic_shared_memory_type.
>
> I think we want that during development, but I'd rather not go there
> when releasing. After all, we don't support a manual choice between
> anonymous mmap/sysv shmem either.

That's true, but that decision has not been uncontroversial - e.g. the
NetBSD guys don't like it, because they have a big performance
difference between those two types of memory.  We have to balance the
possible harm of one more setting against the benefit of letting
people do what they want without needing to recompile or modify code.

>> In addition, I've included an implementation based on mmap of a plain
>> file.  As compared with a true shared memory implementation, this
>> obviously has the disadvantage that the OS may be more likely to
>> decide to write back dirty pages to disk, which could hurt
>> performance.  However, I believe it's worthy of inclusion all the
>> same, because there are a variety of situations in which it might be
>> more convenient than one of the other implementations.  One is
>> debugging.
>
> Hm. Not sure what's the advantage over a corefile here.

You can look at it while the server's running.

>> On MacOS X, for example, there seems to be no way to list
>> POSIX shared memory segments, and no easy way to inspect the contents
>> of either POSIX or System V shared memory segments.
>
> Shouldn't we ourselves know which segments are around?

Sure, that's the point of the control segment.  But listing a
directory is a lot easier than figuring out what the current control
segment contents are.

>> Another use case
>> is working around an administrator-imposed or OS-imposed shared memory
>> limit.  If you're not allowed to allocate shared memory, but you are
>> allowed to create files, then this implementation will let you use
>> whatever facilities we build on top of dynamic shared memory anyway.
>
> I don't think we should try to work around limits like that.

I do.  There's probably someone, somewhere in the world who thinks
that operating system shared memory limits are a good idea, but I have
not met any such person.  There are multiple ways to create shared
memory, and they all have different limits.  Normally, System V limits
are small, POSIX limits are large, and the inherited-anonymous-mapping
trick we're now using for the main shared memory segment has no limits
at all.  It's very common to run into a system where you can allocate
huge numbers of gigabytes of backend-private memory, but if you try to
allocate 64MB of *shared* memory, you get the axe - or maybe not,
depending on which API you use to create it.

I would never advocate deliberately trying to circumvent a
carefully-considered OS-level policy decision about resource
utilization,

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> Agreed, but I think this is a much larger issue than ALTER SYSTEM SET.

Yeah, true.

> I think changing behavior to first-seen would only add to confusion. 
> What we really need is a WARNING when a later postgresql.conf setting
> overrides an earlier one, and ALTER SYSTEM SET's config file could
> behave the same, so you would know right away when you were overriding
> something in postgresql.conf that appeared earlier.

A warning would be nice.  If only everyone read the log files. :/

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > While I appreciate that there are bootstrap-type issues with this, I
> > really don't like this idea of "later stuff can just override earlier
> > stuff".
> 
> > include files and conf.d-style options are for breaking the config up,
> > not to allow you to override options because a file came later than an
> > earlier file.  Our particular implementation of config-file reading
> > happens to lend itself to later-definition-wins, but that's really
> > counter-intuitive for anyone unfamiliar with PG, imv.
> 
> I don't follow this argument at all.  Do you know any software with text
> config files that will act differently from this if the same setting is
> listed twice?  "Last one wins" is certainly what I'd expect.

Have you tried having the same mount specified in multiple files in
fstab.d..?  Also, aiui (not a big exim fan personally), duplicate
definitions in an exim4/conf.d would result in an error.  Playing around
in apache2/sites-enabled, it appears to be "first wins" wrt virtual
hosts.

There's a number of cases where only a single value is being set and
subseqeunt files are 'additive' (eg: ld.so.conf.d), so they don't have
this issue.  Similar to that are script directories, which simply run a
set of scripts in the $DIR.d and, as it's the scripts themselves which
are the 'parameters', and being files in a directory, you can't
duplicate them.  Others (eg: pam.d) define the file name to be an
enclosing context, again preventing duplication by using the filename
itself.

There are counter-examples also; sysctl.d will replace what's already
been set, so perhaps it simply depends on an individual's experience.
For my part, I'd much prefer an error or warning saying "you've got
duplicate definitions of X" than a last-wins approach, though perhaps
that's just because I like things to be very explicit and clear.
Allowing multiple definitions of the same parameter to be valid isn't
'clear' to me.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Bruce Momjian

On Wed, Aug 28, 2013 at 02:30:41PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Tue, Aug 27, 2013 at 09:04:00AM +0530, Amit Kapila wrote:
> > > > I really hate the idea that someone could configure 'X' in
> > > > postgresql.conf and because the auto.conf line is later in the file,
> > > > it's able to override the original setting.  Does not strike me as
> > > > intuitive at all.
> > > 
> > > This is currently how include mechanism works in postgresql.conf,
> > > changing that for this special case can be costly and moreover the
> > > specs for this patch were layout from beginning that way.
> > 
> > Agreed.  If you are worried about ALTER SYSTEM changing postgresql.conf
> > settings, you should move the include_auto line to the top of
> > postgresql.conf, but I don't think that should be the default.
> 
> While I appreciate that there are bootstrap-type issues with this, I
> really don't like this idea of "later stuff can just override earlier
> stuff".
> 
> include files and conf.d-style options are for breaking the config up,
> not to allow you to override options because a file came later than an
> earlier file.  Our particular implementation of config-file reading
> happens to lend itself to later-definition-wins, but that's really
> counter-intuitive for anyone unfamiliar with PG, imv.

Agreed, but I think this is a much larger issue than ALTER SYSTEM SET.

I think changing behavior to first-seen would only add to confusion. 
What we really need is a WARNING when a later postgresql.conf setting
overrides an earlier one, and ALTER SYSTEM SET's config file could
behave the same, so you would know right away when you were overriding
something in postgresql.conf that appeared earlier.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What happens at BIND time?

2013-08-28 Thread Josh Berkus

Tom,

> Does the backend's memory usage climb, or hold steady?  If the former,
> I'd bet on client failure to release resources, eg not closing the
> portals when done with them.  A memory map from MemoryContextStats
> would help determine exactly what's leaking.

FS cache usage increases through the test run, as you'd expect, but the
amount of pinned memory actually remains pretty much constant -- and has
the same usage in both 8.4 (where the BIND issue doesn't happen) and
9.3b2 (where it does).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Andres Freund

On 2013-08-28 14:04:59 -0400, Robert Haas wrote:
> >> +  RegisterDynamicBackgroundWorker(BackgroundWorker
> >> +  *worker, BackgroundWorkerHandle **handle).  Unlike
> >> +  RegisterBackgroundWorker, which can only be called from 
> >> within
> >> +  the postmaster, RegisterDynamicBackgroundWorker 
> >> must be
> >> +  called from a regular backend.
> >>   
> >
> > s/which can only be called from within the posmaster/during initial
> > startup, via shared_preload_libraries/?
> 
> This seems like a possible doc clarification not intimately related to
> the patch at hand.  The only reason that paragraph is getting reflowed
> is because of the additional argument to
> RegisterDynamicBackgroundWorker().  On the substance, I could go
> either way on whether your proposed text is better than what's there
> now.

I agree it's unrelated. I just noticed because it was in the diff ;)

> It seems like it might need a bit of wordsmithing, anyway.

Yes, looks like it could use some. It's not going to be me doing it
tho...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Tom Lane

Stephen Frost  writes:
> While I appreciate that there are bootstrap-type issues with this, I
> really don't like this idea of "later stuff can just override earlier
> stuff".

> include files and conf.d-style options are for breaking the config up,
> not to allow you to override options because a file came later than an
> earlier file.  Our particular implementation of config-file reading
> happens to lend itself to later-definition-wins, but that's really
> counter-intuitive for anyone unfamiliar with PG, imv.

I don't follow this argument at all.  Do you know any software with text
config files that will act differently from this if the same setting is
listed twice?  "Last one wins" is certainly what I'd expect.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Aug 27, 2013 at 09:04:00AM +0530, Amit Kapila wrote:
> > > I really hate the idea that someone could configure 'X' in
> > > postgresql.conf and because the auto.conf line is later in the file,
> > > it's able to override the original setting.  Does not strike me as
> > > intuitive at all.
> > 
> > This is currently how include mechanism works in postgresql.conf,
> > changing that for this special case can be costly and moreover the
> > specs for this patch were layout from beginning that way.
> 
> Agreed.  If you are worried about ALTER SYSTEM changing postgresql.conf
> settings, you should move the include_auto line to the top of
> postgresql.conf, but I don't think that should be the default.

While I appreciate that there are bootstrap-type issues with this, I
really don't like this idea of "later stuff can just override earlier
stuff".

include files and conf.d-style options are for breaking the config up,
not to allow you to override options because a file came later than an
earlier file.  Our particular implementation of config-file reading
happens to lend itself to later-definition-wins, but that's really
counter-intuitive for anyone unfamiliar with PG, imv.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Robert Haas

On Wed, Aug 28, 2013 at 2:04 PM, Robert Haas  wrote:
>> Hm. Not this patches fault, but We seem to allow bgw_start_time ==
>> BgWorkerStart_PostmasterStart here which doesn't make sense...
>
> I can add a check for that.  I agree that it's a separate patch.

On third thought, is there really any point to such a check?  I mean,
no background worker is going to start before it's registered, and by
the time any background worker is registered, we'll be passed the time
indicated by all of those constants: BgWorkerStart_PostmasterStart,
BgWorkerStart_ConsistentState, BgWorkerStart_RecoveryFinished.

I think we should view that field as fixing the earliest time at which
the worker should be started, rather than the exact time at which it
must be started.  Otherwise no value is sensible.  And if we take that
approach, then for a dynamic background worker, any value is OK.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Robert Haas

On Wed, Aug 28, 2013 at 2:04 PM, Robert Haas  wrote:
> I certainly can't promise that the code is bug-free.  But I think it's
> probably better to get this into the tree and let people start playing
> around with it than to continue to maintain it in my private sandbox.
> At this point it's just infrastructure anyway, though there seems to
> be a good deal of interest in it from more than just myself...  and I
> do think it's a useful step forward from where we are today.

Committed with fixes for the issues you mentioned, plus some updates
to out-of-date comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic background workers, round two

2013-08-28 Thread Robert Haas

On Tue, Aug 27, 2013 at 9:50 AM, Andres Freund  wrote:
>> > BgwHandleStatus GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, 
>> > pid_t *pid);
>> > BgwHandleStatus WaitForBackgroundWorkerStartup(BackgroundWorkerHandle 
>> > *handle, pid_t *pid);
>>
>> OK, here's a patch that API.  I renamed the constants a bit, because a
>> process that has stopped is not necessarily gone; it could be
>> configured for restart.  But we can say that it is stopped, at the
>> moment.
>>
>> I'm not sure that this API is an improvement.  But I think it's OK, if
>> you prefer it.
>
> Thanks, looks more readable to me. I like code that tells me what it
> does without having to look in other places ;)

Well, fair enough.  But you might have to look around a bit to figure
out that you now have two functions each of which can return 3 out of
a possible 4 values.  But whatever.

>> +  RegisterDynamicBackgroundWorker(BackgroundWorker
>> +  *worker, BackgroundWorkerHandle **handle).  Unlike
>> +  RegisterBackgroundWorker, which can only be called from 
>> within
>> +  the postmaster, RegisterDynamicBackgroundWorker must 
>> be
>> +  called from a regular backend.
>>   
>
> s/which can only be called from within the posmaster/during initial
> startup, via shared_preload_libraries/?

This seems like a possible doc clarification not intimately related to
the patch at hand.  The only reason that paragraph is getting reflowed
is because of the additional argument to
RegisterDynamicBackgroundWorker().  On the substance, I could go
either way on whether your proposed text is better than what's there
now.  It seems like it might need a bit of wordsmithing, anyway.

> s/STATED/STARTED/

Good catch.

>>  bool
>> -RegisterDynamicBackgroundWorker(BackgroundWorker *worker)
>> +RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
>> + 
>> BackgroundWorkerHandle **handle)
>>  {
>
> Hm. Not this patches fault, but We seem to allow bgw_start_time ==
> BgWorkerStart_PostmasterStart here which doesn't make sense...

I can add a check for that.  I agree that it's a separate patch.

> Maybe add something like Assert(hanlde->slot < max_worker_processes);?

Sure.

> Shouldn't that loop have a CHECK_FOR_INTERRUPTS()?

Yep.

>
> Theoretically this would unset set_latch_on_sigusr1 if it was previously
> already set to 'true'. If you feel defensive you could safe the previous
> value...

Probably a good plan.

> So, besides those I don't see much left to be done. I haven't tested
> EXEC_BACKEND, but I don't anything specific to that here.

I certainly can't promise that the code is bug-free.  But I think it's
probably better to get this into the tree and let people start playing
around with it than to continue to maintain it in my private sandbox.
At this point it's just infrastructure anyway, though there seems to
be a good deal of interest in it from more than just myself...  and I
do think it's a useful step forward from where we are today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hstore: Query speedups with Gin index

2013-08-28 Thread Andres Freund

On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> > Michael Paquier  writes:
> > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith  
> > > wrote:
> > >> The combined entry is used to support "contains (@>)" queries, and the 
> > >> key
> > >> only item is used to support "key contains (?)" queries. This change 
> > >> seems
> > >> to help especially with hstore keys that have high cardinalities. 
> > >> Downsides
> > >> of this change is that it requires an index rebuild, and the index will 
> > >> be
> > >> larger in size.
> > 
> > > Index rebuild would be a problem only for minor releases,
> > 
> > That's completely false; people have expected major releases to be
> > on-disk-compatible for several years now.  While there probably will be
> > future releases in which we are willing to break storage compatibility,
> > a contrib module doesn't get to dictate that.
> > 
> > What might be a practical solution, especially if this isn't always a
> > win (which seems likely given the index-bloat risk), is to make hstore
> > offer two different GIN index opclasses, one that works the traditional
> > way and one that works this way.
> > 
> > Another thing that needs to be taken into account here is Oleg and
> > Teodor's in-progress work on extending hstore:
> > https://www.pgcon.org/2013/schedule/events/518.en.html
> > I'm not sure if this patch would conflict with that at all, but it
> > needs to be considered.
> 
> We can disallow in-place upgrades for clusters that use certain contrib
> modules --- we have done that in the past.

But that really cannot be acceptable for hstore. The probably most
widely used extension there is.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hstore: Query speedups with Gin index

2013-08-28 Thread Bruce Momjian

On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith  wrote:
> >> The combined entry is used to support "contains (@>)" queries, and the key
> >> only item is used to support "key contains (?)" queries. This change seems
> >> to help especially with hstore keys that have high cardinalities. Downsides
> >> of this change is that it requires an index rebuild, and the index will be
> >> larger in size.
> 
> > Index rebuild would be a problem only for minor releases,
> 
> That's completely false; people have expected major releases to be
> on-disk-compatible for several years now.  While there probably will be
> future releases in which we are willing to break storage compatibility,
> a contrib module doesn't get to dictate that.
> 
> What might be a practical solution, especially if this isn't always a
> win (which seems likely given the index-bloat risk), is to make hstore
> offer two different GIN index opclasses, one that works the traditional
> way and one that works this way.
> 
> Another thing that needs to be taken into account here is Oleg and
> Teodor's in-progress work on extending hstore:
> https://www.pgcon.org/2013/schedule/events/518.en.html
> I'm not sure if this patch would conflict with that at all, but it
> needs to be considered.

We can disallow in-place upgrades for clusters that use certain contrib
modules --- we have done that in the past.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GetTransactionSnapshot() in enum.c

2013-08-28 Thread Robert Haas

On Mon, Aug 26, 2013 at 4:31 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Aug 19, 2013 at 1:41 PM, Tom Lane  wrote:
>>> BTW, I notice that the MVCC-catalog-scans patch summarily asserts that
>>> RenumberEnumType no longer poses any concurrency hazards.  I doubt that's
>>> true: isn't it still possible that pg_enum rows acquired through the
>>> syscaches will have inconsistent enumsortorder values, if they were
>>> read at different times?  If you want to examine enumsortorder, you really
>>> need to be comparing rows you know were read with the *same* snapshot.
>
>> Good point, I missed that.  Here's a proposed comment patch.
>
> Looks sane to me.

Thanks for the review.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Spinlock implementation on x86_64 (was Re: [HACKERS] Better LWLocks with compare-and-swap (9.4))

2013-08-28 Thread Tom Lane

Heikki Linnakangas  writes:
> So, my plan is to apply the attached non-locked-tas-spin-x86_64.patch to 
> master. But I would love to get feedback from people running different 
> x86_64 hardware.

Surely this patch should update the existing comment at line 209?  Or at
least point out that a non-locked test in TAS_SPIN is not the same as a
non-locked test in tas() itself.

Other than the commenting, I have no objection to this.  I think you're
probably right that the old tests in which this looked like a bad idea
were adding the unlocked test to tas() not only the spin case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])

2013-08-28 Thread Bruce Momjian

On Tue, Aug 27, 2013 at 09:04:00AM +0530, Amit Kapila wrote:
> > For my part, I'd honestly rather have the first things found be what's
> > picked and later things be ignored.  If you want it controlled by ALTER
> > SYSTEM, then don't set it in postgresql.conf.  The problem with that is
> > there's no "bootstrap" mechanism without actually modifying such configs
> > or making the configs be in auto.conf to begin with, neither of which is
> > ideal, imv.
> >
> > I really hate the idea that someone could configure 'X' in
> > postgresql.conf and because the auto.conf line is later in the file,
> > it's able to override the original setting.  Does not strike me as
> > intuitive at all.
> 
> This is currently how include mechanism works in postgresql.conf,
> changing that for this special case can be costly and moreover the
> specs for this patch were layout from beginning that way.

Agreed.  If you are worried about ALTER SYSTEM changing postgresql.conf
settings, you should move the include_auto line to the top of
postgresql.conf, but I don't think that should be the default.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Spinlock implementation on x86_64 (was Re: [HACKERS] Better LWLocks with compare-and-swap (9.4))

2013-08-28 Thread Heikki Linnakangas


On 21.05.2013 00:20, Heikki Linnakangas wrote:

On 16.05.2013 01:08, Daniel Farina wrote:

On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas
 wrote:

pgbench -S is such a workload. With 9.3beta1, I'm seeing this
profile, when
I run "pgbench -S -c64 -j64 -T60 -M prepared" on a 32-core Linux
machine:

- 64.09% postgres postgres [.] tas
- tas
- 99.83% s_lock
- 53.22% LWLockAcquire
+ 99.87% GetSnapshotData
- 46.78% LWLockRelease
GetSnapshotData
+ GetTransactionSnapshot
+ 2.97% postgres postgres [.] tas
+ 1.53% postgres libc-2.13.so [.] 0x119873
+ 1.44% postgres postgres [.] GetSnapshotData
+ 1.29% postgres [kernel.kallsyms] [k] arch_local_irq_enable
+ 1.18% postgres postgres [.] AllocSetAlloc
...

So, on this test, a lot of time is wasted spinning on the mutex of
ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a
surprisingly steep drop in performance once you go beyond 29 clients
(attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My
theory
is that after that point all the cores are busy, and processes start
to be
sometimes context switched while holding the spinlock, which kills
performance.


I have, I also used linux perf to come to this conclusion, and my
determination was similar: a system was undergoing increasingly heavy
load, in this case with processes>> number of processors. It was
also a phase-change type of event: at one moment everything would be
going great, but once a critical threshold was hit, s_lock would
consume enormous amount of CPU time. I figured preemption while in
the spinlock was to blame at the time, given the extreme nature


Stop the press! I'm getting the same speedup on that 32-core box I got
with the compare-and-swap patch, from this one-liner:

--- a/src/include/storage/s_lock.h
+++ b/src/include/storage/s_lock.h
@@ -200,6 +200,8 @@ typedef unsigned char slock_t;

#define TAS(lock) tas(lock)

+#define TAS_SPIN(lock) (*(lock) ? 1 : TAS(lock))
+
static __inline__ int
tas(volatile slock_t *lock)
{

So, on this system, doing a non-locked test before the locked xchg
instruction while spinning, is a very good idea. That contradicts the
testing that was done earlier when the x86-64 implementation was added,
as we have this comment in the tas() implementation:


/*
* On Opteron, using a non-locking test before the locking instruction
* is a huge loss. On EM64T, it appears to be a wash or small loss,
* so we needn't bother to try to distinguish the sub-architectures.
*/


On my test system, the non-locking test is a big win. I tested this
because I was reading this article from Intel:

http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/.
It says very explicitly that the non-locking test is a good idea:


Spinning on volatile read vs. spin on lock attempt

One common mistake made by developers developing their own spin-wait
loops is attempting to spin on an atomic instruction instead of
spinning on a volatile read. Spinning on a dirty read instead of
attempting to acquire a lock consumes less time and resources. This
allows an application to only attempt to acquire a lock only when it
is free.


Now, I'm not sure what to do about this. If we put the non-locking test
in there, according to the previous testing that would be a huge loss on
Opterons.


I think we should apply the attached patch to put a non-locked test in 
TAS_SPIN() on x86_64.


Tom tested this back in 2011 on a 32-core Opteron [1] and found little 
difference. And on a 80-core Xeon (160 with hyperthreading) system, he 
saw a quite significant gain from it [2]. Reading the thread, I don't 
understand why it wasn't applied to x86_64 back then. Maybe it was for 
the fear of having a negative impact on performance on old Opterons, but 
I strongly suspect that the performance would be OK even on old Opterons 
if you only do the non-locked test in TAS_SPIN() and not in TAS(). Back 
when the comment about poor performance with a non-locking test on 
Opteron was written, we used the same TAS() macro in the first and while 
spinning.


I tried to find some sort of an authoritative guide from AMD that would 
say how spinlocks should be implemented, but didn't find any. The Intel 
guide I linked above says we should apply the patch. Linux uses a ticket 
spinlock thingie nowadays, which is slightly different, but if I'm 
reading the older pre-ticket version correctly, they used to do the 
same. Ie. non-locking test while spinning, but not before the first 
attempt to grab the lock. Same in FreeBSD.


So, my plan is to apply the attached non-locked-tas-spin-x86_64.patch to 
master. But I would love to get feedback from people running different 
x86_64 hardware.


[1] http://www.postgresql.org/message-id/8292.1314641...@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/25989.1314734...@sss.pgh.pa.us

- Heikki
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
index 180c013..2a9c1c4 100644
--- a/src/

Re: [HACKERS] Extension Templates S03E11

2013-08-28 Thread Dimitri Fontaine

Alvaro Herrera  writes:
> "make check" in contrib/pg_upgrade should do the trick.

PASSED

Even after I added extension to the serial_schedule. I don't know if I
need to do anything specific on that area, will wait about some feedback
on that before sending a new version of the patch. Meanwhile my branch
is updated, and I sent the patch-on-patch too.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-08-28 Thread Alvaro Herrera

Dimitri Fontaine wrote:
> Peter Eisentraut  writes:
> > make -C pg_upgrade_support all
> 
> Do we have something automated to easily test pg_upgrade?

"make check" in contrib/pg_upgrade should do the trick.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split postmaster's checkDataDir to src/common

2013-08-28 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> What exactly is the argument for pushing this into 9.3?  Since we
>> are past rc1, we should treat that branch as released.  If you wouldn't
>> back-patch into all supported branches, you shouldn't be patching 9.3
>> either.

> This is to fix the stats_temp_directory issue that the directory is too
> public.  We had agreed that it wasn't a problem before 9.3, but it is in
> that branch because we changed the code to write so many files now
> instead of just one.

To my mind, the new worry about 9.3 was the sloppy file deletion code,
which we've already cleaned up.  The remaining argument for tightening
the directory ownership/permissions checking was a security issue: should
we allow a risk that random people could see or alter the stats files.
That's not different in 9.3 than it was before; splitting up the data
into multiple files doesn't seem (to me) to change that risk any.

In fact, I thought that the end of the discussion left us with doubts
about whether we wanted to change this at all.  Andres pointed out for
instance that we might need a lock file in the directory to prevent
multiple PG instances from trying to share one temp directory.  I don't
know that we need to go that far, but if we don't think we need to defend
against that case, do we need defenses here at all?  In the end this is a
"you break it, you get to keep both pieces" issue, with not all that
severe consequences even if the DBA does mess it up.  There are other
subdirectories, such as pg_xlog, that we also allow people to relocate,
but for which the consequences of a bad config would be far worse than
they are for the stats dir.  We don't go out of our way to prevent
config foulups for other subdirectories, so why this one?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Master-slave visibility order

2013-08-28 Thread Ants Aasma

I'm currently implementing commit sequence number (CSN) based
snapshots and I hit a design decision that I would like to resolve
before I have too much code to rewrite.

The issue is commit visibility ordering on slaves. As a couple of
threads on hackers have already noted, currently commit order on
slaves can differ from what is seen on master. This arises from the
fact that on master commit visibility is determined by the order of
ProcArrayLock acquisition by ProcArrayEndTransaction(). On slaves
commit visibility is exactly the order of commit records in WAL.
Because XLogInsert() in RecordTransactionCommit() is not interlocked
with ProcArrayEndTransaction() these orders can differ. In case of
mixed sync and async transactions they in fact are quite likely to
differ due to the durability wait in RecordTransactionCommit().

It's not possible to change master commit order to match WAL order
because then either async transactions must either wait behind sync
transactions before returning losing the point of async; or async
transactions must return without becoming visible, changing user
visible semantics; or sync transactions must become visible before
they become durable, again changing user visible semantics.

As it's not possible to change master commit order, the slave
visibility order must change for the orders to be consistent. WAL
currently doesn't have the information to reconstruct master commit
order. Either we need to add a new WAL record for the commit order
(only necessary when wal_level=hot_standby) or add a side channel to
replication connections to communicate commit order information.

One more consideration here is the wish expressed by several hackers
that commit record LSNs could be used as CSNs. One of the most
interesting benefits of this is the property of LSNs being the same
over the whole cluster, meaning that it would be relatively simple to
create cluster wide consistent snapshots.

I currently see the following courses of action:

1. Do nothing about the inconsistency, use a transient global counter
for master commit order and commit record LSN for slaves.
   Pro: doesn't change any semantics
   Con: we are not making any progress towards cluster wide snapshots
or even serializable transactions on slaves.

2. Create a new WAL record type that is inserted when a transaction
becomes visible. LSN of this record determines transaction visibility
order. Async transactions can be optimized to skip this record. This
record does not need to be flushed.
   Pro: cluster wide consistency, replication method agnostic
   Con: one extra WAL record insertion per writing transaction. (32
bytes of WAL per tx)

3. Use a transient global counter on master, send xid-csn pairs to
slave via a side channel on the replication connection.
   Pro: Less overhead than WAL records
   Con: replication protocol needs (possibly invasive) changes, WAL
shipping based replication can't use this mechanism, lots of extra
code required.

4. Make the choice between 1 and 2 user configurable (it seems to me
that it could even be changed without a restart).

Thoughts?

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] split postmaster's checkDataDir to src/common

2013-08-28 Thread Alvaro Herrera

Tom Lane wrote:

> What exactly is the argument for pushing this into 9.3?  Since we
> are past rc1, we should treat that branch as released.  If you wouldn't
> back-patch into all supported branches, you shouldn't be patching 9.3
> either.

This is to fix the stats_temp_directory issue that the directory is too
public.  We had agreed that it wasn't a problem before 9.3, but it is in
that branch because we changed the code to write so many files now
instead of just one.

If there's agreement to tighten the permission check in HEAD only, then
I will only commit to that branch.  But it seems a bit odd to me
postpone the tightening.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Behaviour of take over the synchronous replication

2013-08-28 Thread Amit Kapila

On Tue, Aug 27, 2013 at 4:51 PM, Sawada Masahiko  wrote:
> On Sun, Aug 25, 2013 at 3:21 PM, Amit Kapila  wrote:
>> On Sat, Aug 24, 2013 at 2:46 PM, Sawada Masahiko  
>> wrote:
>>> On Sat, Aug 24, 2013 at 3:14 AM, Josh Berkus  wrote:
 On 08/23/2013 12:42 AM, Sawada Masahiko wrote:
> in case (a), those priority is clear. So I think that re-taking over
> is correct behaviour.
> OHOT, in case (b), even if AAA and BBB are set same priority, AAA
> server steals SYNC replication.
> I think it is better that BBB server continue behaviour SYNC standby,
> and AAA should become potential server.

 So, you're saying that:

 1) synchronous_standby_names = '*'

 2) replica 'BBB' is the current sync standby

 3) replica 'AAA' comes online

 4) replica 'AAA' grabs sync status

 ?
>>> I'm sorry that you are confuse.
>>> It means that
>>>
>>> 1) synchronous_standby_names = '*'
>>>
>>> 2) replica 'AAA' is the current sync standby
>>>
>>> 3) replica 'BBB' is the current async standby (potential sync standby)
>>>
>>> 4) replica 'AAA' fail. after that, replica 'BBB' is current sync standby.
>>>
>>> 5) replica 'AAA' comes online
>>>
>>> 6) replica 'AAA' grabs sync status
>>>

>>>
>>>
 If that's the case, I'm not really sure that's undesirable behavior.
 One could argue fairly persuasively that if you care about the
 precendence order of sync replicas, you shouldn't use '*'. And the rule
 of "if using *, the lowest-sorted replica name has sync" is actually a
 predictable, easy-to-understand rule.

 So if you want to make this a feature request, you'll need to come up
 with an argument as to why the current behavior is bad. Otherwise,
 you're just asking us to document it better (which is a good idea).
>>> It is not depend on name of standby server. That is, The standby server,
>>> which was connected to the master server during initial configration
>>> replication, is top priority even if priority of two server are same.
>>
>> What is happening here is that incase of '*' as priority of both are
>> same, system will choose whichever
>> comes in list of registered standby's first (list is maintained in
>> structure WalSndCtl).
>> Each standby is registered with WalSndCtl when a new WALSender is
>> started in function InitWalSenderSlot().
>> As 'AAA' has been registered first it becomes preferred sync standby
>> even if priorities of both are same.
>> When 'AAA' goes down, it marks that Slot entry as free (by setting
>> pid=0 in function WalSndKill),
>> now when 'AAA' comes back again, it gets that free Slot entry and
>> again becomes preferred sync standby.
>>
>> Now if we want to fix as you are suggesting which I don't think is
>> necessary, we might need to change WalSndKill and some other place so
>> that whenever any standby goes down, it changes slots for already
>> registered standby's.
>>> User must remember that which standby server connected to master server at
>>> first.
>>> I think that this behavior confuse user.
>>> so I think that we need to modify this behaviour or if '*' is used, priority
>>> of server is not same (modifying manual is also good).
>>
>> Here user has done the settings (setting synchronous_standby_names =
>> '*'), after which he will not have any control which standby will
>> become sync standby, so ideally he should not complain.
>>
>> It might be case that for some users current behavior is good enough
>> which means that with '*' whichever standby has become sync standby
>> first, it will be the sync standby always if alive.

> I'm thinking that it is not necessary to change WalSndKill.
> For example, we add the value (e.g., sync_standby) which have that
> which wal sender is active SYNC rep.
> And if sync_standby is already set and it is active, server doesn't
> looking for active standby.
> Only if sync_standby is not set and it is inactive, server looking for
> that which server is active SYNC rep.
> If so, we also prevent to find active SYNC rep whenever
> SyncRepReleaseWaiters() is called.
   For '*' case, it will be okay, but when the user has given proper
names, in that case even if there is any active Sync
   Rep, it has to be changed based on priority.

   I think here how to provide a fix, so that behavior gets changed to
what you describe is a second priority work, first
   is to show the value of use-case. Do you really know where people
actually setup using '*' as configuration and if
   yes, are they annoyed with current behavior?

   I have thought about it, but could imagine a scenario where people
will be using '*' in their production
   configurations, may be it will be useful in test labs.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2013-08-28 Thread Tom Lane

Darren Duncan  writes:
> That's a really old post/thread, and I'm not arguing for any kind of action 
> related to RULEs, please disregard the message. -- Darren Duncan

Oh, my fault --- for some reason my mail reader popped it up as an unread
message, and I failed to notice the date.  My apologies.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Valgrind Memcheck support

2013-08-28 Thread Andres Freund

On 2013-08-27 23:46:23 -0400, Noah Misch wrote:
> On Tue, Aug 27, 2013 at 04:14:27PM +0200, Andres Freund wrote:
> > On 2013-06-09 17:25:59 -0400, Noah Misch wrote:
> > > ***
> > > *** 846,851  exec_simple_query(const char *query_string)
> > > --- 847,856 
> > >   
> > >   TRACE_POSTGRESQL_QUERY_START(query_string);
> > >   
> > > + #ifdef USE_VALGRIND
> > > + VALGRIND_PRINTF("statement: %s\n", query_string);
> > > + #endif
> > > + 
> > 
> > Is there a special reason for adding more logging here? I find this
> > makes the instrumentation much less useful since reports easily get
> > burried in those traces. What's the advantage of doing this instead of
> > log_statement=...? Especially as that location afaics won't help for the
> > extended protocol?
> 
> I typically used "valgrind --log-file=...".  To determine via log_statement
> which SQL statement caused a particular Valgrind error, I would match by
> timestamp; this was easier.  In retrospect, log_statement would have sufficed
> given both Valgrind and PostgreSQL logging to stderr.  Emitting the message in
> exec_simple_query() and not in exec_execute_message() is indeed indefensible.

It's an understandable mistake, nothing indefensible. We don't really
test the extended protocol :(

> That being said, could you tell me more about your workflow where the extra
> messages cause a problem?  Do you just typically diagnose each Valgrind error
> without first isolating the pertinent SQL statement?

There's basically two scenarios where I found it annoying:
a) I manually reproduce some issue, potentially involving several psql
shells. All those fire up autocompletion and such queries which bloat
the log while I actually only want to analyze something specific.
b) I don't actually look for anything triggered by an SQL statement
itself, but am looking for issues in a walsender, background worker,
whatever. I still need some foreground activity to trigger the behaviour
I want to see, but once more, valgrind's logs hide the error.

Also, I sometimes use valgrind's --db-command/--vgdb which gives you
enough context from the backtrace itself since you can just get the
statement from there.

I vote for just removing that VALGRIND_PRINTF - it doesn't give you
anything you cannot easily achieve otherwise...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-08-28 Thread Andres Freund

On 2013-08-28 13:58:08 +0900, Michael Paquier wrote:
> On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund  
> wrote:
> > On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
> >> I have been working a little bit more on this patch for the next
> >> commit fest. Compared to the previous version, I have removed the part
> >> of the code where process running REINDEX CONCURRENTLY was waiting for
> >> transactions holding a snapshot older than the snapshot xmin of
> >> process running REINDEX CONCURRENTLY at the validation and swap phase.
> >> At the validation phase, there was a risk that the newly-validated
> >> index might not contain deleted tuples before the snapshot used for
> >> validation was taken. I tried to break the code in this area by
> >> playing with multiple sessions but couldn't. Feel free to try the code
> >> and break it if you can!
> >
> > Hm. Do you have any justifications for removing those waits besides "I
> > couldn't break it"? The logic for the concurrent indexing is pretty
> > intricate and we've got it wrong a couple of times without noticing bugs
> > for a long while, so I am really uncomfortable with statements like this.
> Note that the waits on relation locks are not removed, only the wait
> phases involving old snapshots.
> 
> During swap phase, process was waiting for transactions with older
> snapshots than the one taken by transaction doing the swap as they
> might hold the old index information. I think that we can get rid of
> it thanks to the MVCC snapshots as other backends are now able to see
> what is the correct index information to fetch.

I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.

I need to look into the patch for more details.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.4] row level security

2013-08-28 Thread Kohei KaiGai

2013/8/28 Oleg Bartunov :
> btw, there is serious problem with row-level security and constraints. For
> example, user with low security level could use unique constraint to know
> about existence of a row with higher security.  I don't know, what is the
> best practice to avoid this.
>
It is out of scope for this feature. We usually calls this type of information
leakage "covert channel"; that is not avoidable in principle.
However, its significance is minor, because attacker must know identical
data to be here, or must have proving for each possible values.
Its solution is simple. DBA should not use value to be confidential as unique
key. If needed, our recommendation is alternative key, instead of natural key,
because its value itself does not have worth.

I should add a note of caution onto the documentation according to
the previous consensus, however, I noticed it had gone from the sgml files
while I was unaware. So, let me add description on the documentation.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extension Templates S03E11

2013-08-28 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> make -C pg_upgrade_support all

Do we have something automated to easily test pg_upgrade?

My memories of how pg_upgrade works with extensions makes me believe
that I don't have anything special to do when those extensions have been
made available through a template: the scripts are not used. That said,
we want to retain some new dependencies…

The contrib/pg_upgrade/IMPLEMENTATION file is silent about upgrading
extensions…

I fixed the pg_upgrade_support compiling in my branch here, please find
the patch-on-patch attached. I also checked that the whole of contribs
now build fine and didn't find any other problem.

Regards,
-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

commit 71e19099ab935413211e0e1b63529bc71eff7bd9
Author: Dimitri Fontaine 
Date:   Wed Aug 28 11:40:27 2013 +0200

Fix compiling contrib/pg_upgrade_support

diff --git a/contrib/pg_upgrade_support/pg_upgrade_support.c b/contrib/pg_upgrade_support/pg_upgrade_support.c
index 99e64c4..efb75a7 100644
--- a/contrib/pg_upgrade_support/pg_upgrade_support.c
+++ b/contrib/pg_upgrade_support/pg_upgrade_support.c
@@ -190,7 +190,8 @@ create_empty_extension(PG_FUNCTION_ARGS)
 		 text_to_cstring(extVersion),
 		 extConfig,
 		 extCondition,
-		 requiredExtensions);
+		 requiredExtensions,
+		 InvalidOid);
 
 	PG_RETURN_VOID();
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

37 matches

Mail list logo