Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Amit Langote
On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote  wrote:
> So, I took the Vinayak's latest patch and rewrote it a little
...
> I broke it into two:
>
> 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
> 0002-Implement-progress-reporting-for-VACUUM-command.patch

Oops, unamended commit messages in those patches are misleading.  So,
please find attached corrected versions.

Thanks,
Amit


0001-Provide-a-way-for-utility-commands-to-report-progres-v2.patch
Description: Binary data


0002-Implement-progress-reporting-for-VACUUM-command-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-03-04 Thread Fabien COELHO



Attached is the fixed patch for the array method.


Committed with a few tweaks, including running pgindent over some of it.


Thanks. So the first set of functions is in, and the operators are
executed as functions as well. Fabien, are you planning to send
rebased versions of the rest? By that I mean the switch of the
existing subcommands into equivalent functions and the handling of
double values as parameters for those functions.


Here they are:
 - 32-b: add double functions, including double variables
 - 32-c: remove \setrandom support (advice to use \set + random instead)

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..2133bf7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -794,9 +794,10 @@ pgbench  options  dbname
 
 
  
-  Sets variable varname to an integer value calculated
+  Sets variable varname to a value calculated
   from expression.
   The expression may contain integer constants such as 5432,
+  double constants such as 3.14159,
   references to variables :variablename,
   and expressions composed of unary (-) or binary operators
   (+, -, *, /,
@@ -809,7 +810,7 @@ pgbench  options  dbname
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -829,66 +830,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 / parameter around the mean; for instance, if
-  parameter is 4.0, 67% of values are drawn from the
-  middle quarter (1.0 / 4.0) of the interval (i.e. from
-  3.0 / 8.0 to 5.0 / 8.0) and 95% from
-  the middle half (2.0 / 4.0) of the interval (second and
-  third quartiles). The minimum parameter is 2.0 for
-  performance of the Box-Muller transform.
- 
+  
+   
+\setrandom n 1 10 exponential 3.0 is equivalent to
+\set n random_exponential(1, 10, 3.0) and uses an
+exponential distribution.
+   
+  
 
- 
-  For an exponential distribution, parameter
-  controls the distribution by truncating a quickly-decreasing
-  exponential distribution at parameter, and then
-  projecting onto integers between the bounds.
-  To be precise, with
-
-f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
-
-  Then value i between min and
-  max inclusive is drawn with probability:
-  f(x) - f(x + 1).
-  Intuitively, the larger parameter, the more
-  frequently values close to min are accessed, and the
-  less frequently values close to max are accessed.
-  The closer to 0 parameter, the flatter (more uniform)
-  the access distribution.
-  A crude approximation of the distribution is that the most frequent 1%
-  values in the range, close to min, are drawn
-  parameter% of the time.
-  parameter value must be strictly positive.
+  
+   
+\setrandom n 1 10 gaussian 2.0 is equivalent to
+\set n random_gaussian(1, 10, 2.0), and uses a gaussian
+distribution.
+   
+  
+ 
+
+   See the documentation of these functions below for further information
+   about the precise shape of these distributions, depending on the value
+   of the parameter.
  
 
  
@@ -967,34 +937,6 

Re: [HACKERS] extend pgbench expressions with functions

2016-03-04 Thread Fabien COELHO



Committed with a few tweaks, including running pgindent over some of it.


Thanks!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Amit Langote
On Sat, Mar 5, 2016 at 7:11 AM, Robert Haas  wrote:
> On Fri, Feb 26, 2016 at 3:28 AM,   wrote:
>> Thank you for your comments.
>> Please find attached patch addressing following comments.
>
> I'm positive I've said this at least once before while reviewing this
> patch, and I think more than once: we should be trying to build a
> general progress-reporting facility here with vacuum as the first
> user.  Therefore, for example, pg_stat_get_progress_info's output
> columns should have generic names, not names specific to VACUUM.
> pg_stat_vacuum_progress can alias them to a vacuum-specific name.  See
> for example the relationship between pg_stats and pg_statistic.
>
> I think VACUUM should have three phases, not two.  lazy_vacuum_index()
> and lazy_vacuum_heap() are lumped together right now, but I think they
> shouldn't be.
>
> Please create named constants for the first argument to
> pgstat_report_progress_update_counter(), maybe with names like
> PROGRESS_VACUUM_WHATEVER.
>
> +   /* Update current block number of the relation */
> +   pgstat_report_progress_update_counter(2, blkno + 1);
>
> Why + 1?
>
> I thought we had a plan to update the counter of scanned index pages
> after each index page was vacuumed by the AM.  Doing it only after
> vacuuming the entire index is much less granular and generally less
> useful.   See 
> http://www.postgresql.org/message-id/56500356.4070...@bluetreble.com
>
> +   if (blkno == nblocks - 1 &&
> vacrelstats->num_dead_tuples == 0 && nindexes != 0
> +   && vacrelstats->num_index_scans == 0)
> +   total_index_pages = 0;
>
> I'm not sure what this is trying to do, perhaps because there is no
> comment explaining it.  Whatever the intent, I suspect that such a
> complex test is likely to be fragile.  Perhaps there is a better way?

So, I took the Vinayak's latest patch and rewrote it a little while
maintaining the original idea but modifying code to some degree.  Hope
original author(s) are okay with it.  Vinayak, do see if the rewritten
patch is alright and improve it anyway you want.

I broke it into two:

0001-Provide-a-way-for-utility-commands-to-report-progres.patch
0002-Implement-progress-reporting-for-VACUUM-command.patch

The code review comments received recently (including mine) have been
incorporated.

However, I didn't implement the report-per-index-page-vacuumed bit but
should be easy to code once the details are finalized (questions like
whether it requires modifying any existing interfaces, etc).

Thanks,
Amit


0001-Provide-a-way-for-utility-commands-to-report-progres.patch
Description: Binary data


0002-Implement-progress-reporting-for-VACUUM-command.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Mar 5, 2016 at 1:41 PM, Petr Jelinek  wrote:
>> I vote for just using sed considering we need flex and bison anyway.

> OK cool, we could go with something else than sed to generate probes.h
> but that seems sensible considering that this should definitely be
> back-patched. Not sure what the others think about adding a new file
> in the source tarball by default though.

AFAIK, sed flex and bison originate from three separate source projects;
there is no reason to suppose that the presence of flex and bison on a
particular system guarantee the presence of sed.  I thought the proposal
to get rid of the psed dependence in favor of some more perl code was
pretty sane.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 1:41 PM, Petr Jelinek  wrote:
> On 04/03/16 15:23, Michael Paquier wrote:
>> OK, attached are a set of patches that allowed me to compile Postgres
>> using VS2015, in more details:
>> - 0001, as mentioned by Petr upthread, psed is removed from the core
>> distribution of Perl in 5.22, so when installing ActivePerl it is not
>> possible to create probes.h, and the code compilation would fail. I
>> bumped into that so here is a patch. What I am proposing here is to
>> replace psed by sed, sed being available in MSYS like bison and flex,
>> so when building using MSVC the environment to set up is normally
>> already good to go even with this additional dependency. Now, it is
>> important to mention that probes.h is not part of a source tarball. I
>> think that we would want probes.h to be part of a source tarball so as
>> it would be possible to compile the code on Windows using MSVC without
>> having to install MSYS. I haven't done that in this patch, thoughts on
>> the matter are welcome.
>
> I vote for just using sed considering we need flex and bison anyway.

OK cool, we could go with something else than sed to generate probes.h
but that seems sensible considering that this should definitely be
back-patched. Not sure what the others think about adding a new file
in the source tarball by default though.

>> - 0003, to address a compilation failure that I bumped into when
>> compiling ecpg. In src/port, TIMEZONE_GLOBAL and TZNAME_GLOBAL refer
>> to respectively timezone and tzname, however for win32, those should
>> be _timezone and _tzname. See here:
>> https://msdn.microsoft.com/en-us/library/htb3tdkc.aspx
>
> Yep I hit that one as well, looks good.

MinGW would react to that correctly I think. If I look at
mingw/include/timezone.h, both timezone and _timezone are defined. I
would think that this is intentional to declare both there.

>> - 0004, which is to address the problem of the missing lc_codepage
>> from locale.h in src/port/. I have been pondering about the use of
>> more fancy routines like GetLocaleInfoEx as mentioned by Petr
>> upthread. However, I think that we had better avoid any kind of
>> complication and just fall back to the old code path should _MSC_VER
>>>
>>> = 1900. We could always reuse lc_codepage if it gets reintroduced in
>>
>> a future version of VS.
>
> Well I am worried that this way we might break existing installs which means
> we can't backpatch this. The problem here is that the fallback code does not
> support the - format which Microsoft documents everywhere
> as recommended locale format. The good news is that our own initdb won't
> auto-generate those when no locale was specified as it uses the setlocale()
> which returns the legacy (and not recommended) locale names and our fallback
> code can handle those. But manually set locales can be a problem.

I am open to more fancy solutions if it is possible to get reliably
the codepage in a different way, but I am not sure this is worth the
complication. The pre-VS2012 code has been able to live with that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Amit Kapila
On Fri, Mar 4, 2016 at 9:59 PM, Robert Haas  wrote:
>
> On Fri, Mar 4, 2016 at 12:06 AM, Dilip Kumar 
wrote:
> > I have tried the approach of group extend,
> >
> > 1. We convert the extension lock to TryLock and if we get the lock then
> > extend by one block.2.
> > 2. If don't get the Lock then use the Group leader concep where only one
> > process will extend for all, Slight change from ProcArrayGroupClear is
that
> > here other than satisfying the requested backend we Add some extra
blocks in
> > FSM, say GroupSize*10.
> > 3. So Actually we can not get exact load but still we have some factor
like
> > group size tell us exactly the contention size and we extend in
multiple of
> > that.
>
> This approach seems good to me, and the performance results look very
> positive.  The nice thing about this is that there is not a
> user-configurable knob; the system automatically determines when
> larger extensions are needed, which will mean that real-world users
> are much more likely to benefit from this.
>

I think one thing which needs more thoughts about this approach is that we
need to maintain some number of slots so that group extend for different
relations can happen in parallel.  Do we want to provide simultaneous
extension for 1, 2, 3, 4, 5 or more number of relations?  I think providing
it for three or four relations should be okay as higher the number we want
to provide, bigger the size of PGPROC structure will be.

+GroupExtendRelation(PGPROC *proc, Relation relation, BulkInsertState
bistate)

+{

+ volatile PROC_HDR *procglobal = ProcGlobal;

+ uint32 nextidx;

+ uint32 wakeidx;

+ int extraWaits = -1;

+ BlockNumber targetBlock;

+ int count = 0;

+

+ /* Add ourselves to the list of processes needing a group extend. */

+ proc->groupExtendMember = true;

..

..

+ /* We are the leader.  Acquire the lock on behalf of everyone. */

+ LockRelationForExtension(relation, ExclusiveLock);


To provide it for multiple relations, I think you need to advocate the reloid
for relation in each proc and then get the relation descriptor for relation
extension lock.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Petr Jelinek

On 04/03/16 15:23, Michael Paquier wrote:


OK, attached are a set of patches that allowed me to compile Postgres
using VS2015, in more details:
- 0001, as mentioned by Petr upthread, psed is removed from the core
distribution of Perl in 5.22, so when installing ActivePerl it is not
possible to create probes.h, and the code compilation would fail. I
bumped into that so here is a patch. What I am proposing here is to
replace psed by sed, sed being available in MSYS like bison and flex,
so when building using MSVC the environment to set up is normally
already good to go even with this additional dependency. Now, it is
important to mention that probes.h is not part of a source tarball. I
think that we would want probes.h to be part of a source tarball so as
it would be possible to compile the code on Windows using MSVC without
having to install MSYS. I haven't done that in this patch, thoughts on
the matter are welcome.


I vote for just using sed considering we need flex and bison anyway.


- 0003, to address a compilation failure that I bumped into when
compiling ecpg. In src/port, TIMEZONE_GLOBAL and TZNAME_GLOBAL refer
to respectively timezone and tzname, however for win32, those should
be _timezone and _tzname. See here:
https://msdn.microsoft.com/en-us/library/htb3tdkc.aspx


Yep I hit that one as well, looks good.


- 0004, which is to address the problem of the missing lc_codepage
from locale.h in src/port/. I have been pondering about the use of
more fancy routines like GetLocaleInfoEx as mentioned by Petr
upthread. However, I think that we had better avoid any kind of
complication and just fall back to the old code path should _MSC_VER

= 1900. We could always reuse lc_codepage if it gets reintroduced in

a future version of VS.


Well I am worried that this way we might break existing installs which 
means we can't backpatch this. The problem here is that the fallback 
code does not support the - format which Microsoft 
documents everywhere as recommended locale format. The good news is that 
our own initdb won't auto-generate those when no locale was specified as 
it uses the setlocale() which returns the legacy (and not recommended) 
locale names and our fallback code can handle those. But manually set 
locales can be a problem.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 2 March 2016 at 03:02, Bruce Momjian  wrote:

> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
>
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?


That you won't push it too hard if it works, but works badly, and will be
prepared to back off on the last steps despite all the lead-up
work/time/investment you've put into it.

If FDW-based sharding works, I'm happy enough, I have no horse in this
race. If it doesn't work I don't much care either. What I'm worried about
is it if works like partitioning using inheritance works - horribly badly,
but just well enough that it's served as an effective barrier to doing
anything better.

That's what I want to prevent. Sharding that only-just-works and then stops
us getting anything better into core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 2 March 2016 at 00:03, Robert Haas  wrote:


>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
> 
>

IMO that needs to be true of all hooks into the real innards.

The ProcessUtility_hook API changed a couple of times after introduction
and nobody screamed. I think we just have to mark such places as having
cross-version API volatility, so you should be prepared to #if
PG_VERSION_NUM around them if you use them.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Amit Kapila
On Fri, Mar 4, 2016 at 7:23 PM, Thom Brown  wrote:
>
> On 4 March 2016 at 13:41, Alexander Korotkov  wrote:
> >
> >>
> >> If yes, then the only slight worry is that there will lot of
repetition in
> >> wait_event_type column, otherwise it is okay.
> >
> >
> > There is morerows attribute of entry tag in Docbook SGML, it behaves
like
> > rowspan in HTML.
>
> +1
>

I will try to use morerows in documentation.

> Yes, we do this elsewhere in the docs.  And it is difficult to look
> through the wait event names at the moment.
>
> I'm also not keen on all the "A server process is waiting" all the way
> down the list.
>

How about giving the column name as "Wait For" instead of "Description" and
then use text like "finding or allocating space in shared memory"?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 28 February 2016 at 06:38, Kevin Grittner  wrote:


>
> > For logical replay, applying in batches is actually a good thing since it
> > allows parallelism. We can remove them all from the target's procarray
> all
> > at once to avoid intermediate states becoming visible. So that would be
> the
> > preferred mechanism.
>
> That could be part of a solution.  What I sketched out with the
> "apparent order of execution" ordering of the transactions
> (basically, commit order except when one SERIALIZABLE transaction
> needs to be dragged in front of another due to a read-write
> dependency) is possibly the simplest approach, but batching may
> well give better performance.
>

I'd be really interested in some ideas on how that information might be
usefully accessed. If we could write info on when to apply commits to the
xlog in serializable mode that'd be very handy, especially when looking to
the future with logical decoding of in-progress transactions, parallel
apply, etc.

For parallel apply I anticipated that we'd probably have workers applying
xacts in parallel and committing them in upstream commit order. They'd
sometimes deadlock with each other; when this happened all workers whose
xacts committed after the first aborted xact would have to abort and start
again. Not ideal, but safe.

Being able to avoid that by using SSI information was in the back of my
mind, but with no idea how to even begin to tackle it. What you've
mentioned here is helpful and I'd be interested if you could share a bit
more of your experience in the area.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 27 February 2016 at 15:29, Konstantin Knizhnik  wrote:


> Two reasons:
> 1. There is no ideal implementation of DTM which will fit all possible
> needs and be  efficient for all clusters.
> 2. Even if such implementation exists, still the right way of it
> integration is Postgres should use kind of TM API.
> 
>


I've got to say that this is somewhat reminicient of the discussions around
in-core pooling, where argument 1 is applied to justify excluding pooling
from core/contrib.

I don't have a strong position on whether a DTM should be in core or not as
I haven't done enough work in the area. I do think it's interesting to
strongly require that a DTM be in core while we also reject things like
pooling that are needed by a large proportion of users.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] ExecGather() + nworkers

2016-03-04 Thread Amit Kapila
On Fri, Mar 4, 2016 at 11:41 PM, Robert Haas  wrote:

> On Fri, Mar 4, 2016 at 6:55 AM, Amit Kapila 
> wrote:
> > On Fri, Mar 4, 2016 at 5:21 PM, Haribabu Kommi  >
> > wrote:
> >>
> >> On Fri, Mar 4, 2016 at 10:33 PM, Amit Kapila 
> >> wrote:
> >> > On Fri, Mar 4, 2016 at 11:57 AM, Haribabu Kommi
> >> > 
> >> > wrote:
> >> >>
> >> >> On Wed, Jan 13, 2016 at 7:19 PM, Amit Kapila <
> amit.kapil...@gmail.com>
> >> >> wrote:
> >> >> >>
> >> >> >
> >> >> > Changed the code such that nworkers_launched gets used wherever
> >> >> > appropriate instead of nworkers.  This includes places other than
> >> >> > pointed out above.
> >> >>
> >> >> The changes of the patch are simple optimizations that are trivial.
> >> >> I didn't find any problem regarding the changes. I think the same
> >> >> optimization is required in "ExecParallelFinish" function also.
> >> >>
> >> >
> >> > There is already one change as below for ExecParallelFinish() in
> patch.
> >> >
> >> > @@ -492,7 +492,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
> >> >
> >> >   WaitForParallelWorkersToFinish(pei->pcxt);
> >> >
> >> >
> >> >
> >> >   /* Next, accumulate buffer usage. */
> >> >
> >> > - for (i = 0; i < pei->pcxt->nworkers; ++i)
> >> >
> >> > + for (i = 0; i < pei->pcxt->nworkers_launched; ++i)
> >> >
> >> >   InstrAccumParallelQuery(>buffer_usage[i]);
> >> >
> >> >
> >> > Can you be slightly more specific, where exactly you are expecting
> more
> >> > changes?
> >>
> >> I missed it during the comparison with existing code and patch.
> >> Everything is fine with the patch. I marked the patch as ready for
> >> committer.
> >>
> >
> > Thanks!
>
> OK, committed.
>
>
Thanks.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 27 February 2016 at 11:54, Robert Haas  wrote:



> I could submit a patch adding
> hooks to core to enable all of the things (or even just some of the
> things) that EnterpriseDB has changed in Advanced Server, and that
> patch would be rejected so fast it would make your head spin, because
> of course the core project doesn't want to be burdened with
> maintaining a whole bunch of hooks for the convenience of
> EnterpriseDB.


I can imagine that many such hooks would have little use beyond PPAS, but
I'm somewhat curious as to if any would have wider applications. It's not
unusual for me to be working on something and think "gee, I wish there was
a hook here".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Craig Ringer
On 4 March 2016 at 22:23, Michael Paquier  wrote:

> On Fri, Mar 4, 2016 at 3:54 PM, Michael Paquier
>  wrote:
> > I still need to dig into that in more details. For the time being the
> > patch attached is useful IMO to plug in VS 2015 with the existing
> > infrastructure. So if anybody has a Windows environment, feel free to
> > play with it and dig into those problems. I'll update this thread once
> > I have a more advanced status.
>
> OK, attached are a set of patches that allowed me to compile Postgres
> using VS2015, in more details:
> - 0001, as mentioned by Petr upthread, psed is removed from the core
> distribution of Perl in 5.22, so when installing ActivePerl it is not
> possible to create probes.h, and the code compilation would fail. I
> bumped into that so here is a patch. What I am proposing here is to
> replace psed by sed, sed being available in MSYS like bison and flex,
> so when building using MSVC the environment to set up is normally
> already good to go even with this additional dependency.


The assumption here is that we're using msys to provide bison and flex
(probably via msysgit), so adding sed isn't any more intrusive.

I think that's reasonable, but wanted to spell it out since right now msys
isn't actually a dependency of the MSVC builds, just a convenient way to
get some of the dependencies. This still adds a new dependency, but it's
one most people will have anyway. If they're using bison/flex from gnuwin32
or whatever instead they can get sed there too. So +1, sensible.


> Now, it is
> important to mention that probes.h is not part of a source tarball. I
> think that we would want probes.h to be part of a source tarball so as
> it would be possible to compile the code on Windows using MSVC without
> having to install MSYS. I haven't done that in this patch, thoughts on
> the matter are welcome.
>

That's consistent with how we include the generated scanner and lexer files
etc in the source tarball, so +1.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 8:27 PM, Joshua D. Drake  wrote:
> This does not sound like Bruce at all. Bruce is a lot of things, stubborn,
> sometimes temperamental, a lot of times like you... a hot head but he does
> not take credit for other people's work in my experience.

On the whole, Bruce is a much nicer guy than I am.  But I can't see
eye to eye with him on this.  I admit I may be being unfair to him,
but I'm telling it like I see it.  Like I do.

> Even if there was, so what? IF EDB wants to have a secret plan to push a lot
> of cool features to .Org, who cares? In the end, it all has to go through
> peer review and the meritocracy anyway.

I would just like to say that if I or my employer ever get accused of
having a nefarious plan, and somehow I get to pick *which* nefarious
plan I or my employer is to be accused of having, "a secret plan to
push a lot of cool features to .Org" sounds like a good one for me to
pick, especially since, yeah, we have that plan.  We plan to (try to)
push a lot of cool features to .Org.  We - or at least I - do not plan
to do it in a way that is anything but respectful to the community
process.  Specifically, and in no particular order, we plan to
continue contributing performance and scalability enhancements,
improvements to parallel query, and FDW-related improvements, just as
we have for 9.6.  We may also try to contribute other stuff that we
think will be cool and benefit PostgreSQL.  Suggestions are welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is there a way around function search_path killing SQL function inlining?

2016-03-04 Thread Regina Obe
I think the answer to this question is NO, but thought I'd ask. 

A lot of folks in PostGIS land are suffering from restore issues,
materialized view issues etc. because we have functions such as

ST_Intersects

Which does _ST_Intersects  AND && 

Since _ST_Intersects is not schema qualified, during database restore (which
sets the schema to the table or view schema), materialized views that depend
on this do not come back.
It's also a serious issue with raster, though that one can be fixed by
setting search_path since the issue there doesn't use SQL inlining.

So I had this bright idea of setting the search_path of the functions to
where PostGIS is installed.

https://trac.osgeo.org/postgis/ticket/3490

To my disappointment, I noticed our spatial indexes no longer worked if I do
this since they rely on SQL inlining.

Schema qualifying our function calls is not an option at this time since

1) People install postgis in different schemas so we'd have to force them to
install in the same schema which I think will break a lot of 3rd party apps.
2) It's a lot of functions to hand touch.

Any suggestions are welcome.  Any other issues I should be aware of with 

ALTER FUNCTION .. set search_path..

Only other I noticed is it seems to be ignored in CREATE EXTENSION script
(at least when using dynamic execute).  I put it in and it seems to be
entirely ignored.

Thanks,
Regina








-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JPUG wants to have a copyright notice on the translated doc

2016-03-04 Thread Joshua D. Drake

On 03/04/2016 06:01 PM, Tatsuo Ishii wrote:


Considering they are BSD licensed, I am not sure what abuses could be
taken?


I imagine kind of an extream case: a bad guy removes "Copyright
1996-2016 The PostgreSQL Global Development Group" and replaces it
with his/her copyright.


Right but again, what happens when they do that? In a practical sense, 
nothing because we are BSD licensed.


In short, no worries, 100% happy that JPUG is taking initiative.

Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JPUG wants to have a copyright notice on the translated doc

2016-03-04 Thread Tatsuo Ishii
> On 03/04/2016 05:39 PM, Tatsuo Ishii wrote:
>> JPUG (Japan PostgreSQL Users Group) would like to add a copyright
>> ntice to the Japanese translated docs.
>>
>> http://www.postgresql.jp/document/9.5/html/
>>
>> Currently "Copyright 1996-2016 The PostgreSQL Global Development
>> Group" is showed on the translated doc (of course in Japanese). What
>> JPUG is wanting is, adding something like "Copyright 2016 Japan
>> PostgreSQL Users Group" to this.
> 
> As I understand it the translation would be copyrighted by the people
> that do the translation so it is perfectly reasonable to have JPUG
> hold the copyright for the .jp translation.

Thanks for clarification.

>> The reason for this is, "Copyright 1996-2016 The PostgreSQL Global
>> Development Group" may not be effective from the point of Japan laws
>> because "The PostgreSQL Global Development Group" is not a valid
>> entity to be copyright holder according to Japan's laws. To prevent
>> from abuses of the translated docs, JPUG thinks that JPUG needs to add
>> the copyright notice by JPUG to the Japanese translated docs (note
>> that JPUG is a registered non profit organization and can be a
>> copyright holder).
> 
> Considering they are BSD licensed, I am not sure what abuses could be
> taken?

I imagine kind of an extream case: a bad guy removes "Copyright
1996-2016 The PostgreSQL Global Development Group" and replaces it
with his/her copyright.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JPUG wants to have a copyright notice on the translated doc

2016-03-04 Thread Joshua D. Drake

On 03/04/2016 05:39 PM, Tatsuo Ishii wrote:

JPUG (Japan PostgreSQL Users Group) would like to add a copyright
ntice to the Japanese translated docs.

http://www.postgresql.jp/document/9.5/html/

Currently "Copyright 1996-2016 The PostgreSQL Global Development
Group" is showed on the translated doc (of course in Japanese). What
JPUG is wanting is, adding something like "Copyright 2016 Japan
PostgreSQL Users Group" to this.


As I understand it the translation would be copyrighted by the people 
that do the translation so it is perfectly reasonable to have JPUG hold 
the copyright for the .jp translation.




The reason for this is, "Copyright 1996-2016 The PostgreSQL Global
Development Group" may not be effective from the point of Japan laws
because "The PostgreSQL Global Development Group" is not a valid
entity to be copyright holder according to Japan's laws. To prevent
from abuses of the translated docs, JPUG thinks that JPUG needs to add
the copyright notice by JPUG to the Japanese translated docs (note
that JPUG is a registered non profit organization and can be a
copyright holder).


Considering they are BSD licensed, I am not sure what abuses could be taken?

Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JPUG wants to have a copyright notice on the translated doc

2016-03-04 Thread Tatsuo Ishii
JPUG (Japan PostgreSQL Users Group) would like to add a copyright
ntice to the Japanese translated docs.

http://www.postgresql.jp/document/9.5/html/

Currently "Copyright 1996-2016 The PostgreSQL Global Development
Group" is showed on the translated doc (of course in Japanese). What
JPUG is wanting is, adding something like "Copyright 2016 Japan
PostgreSQL Users Group" to this.

The reason for this is, "Copyright 1996-2016 The PostgreSQL Global
Development Group" may not be effective from the point of Japan laws
because "The PostgreSQL Global Development Group" is not a valid
entity to be copyright holder according to Japan's laws. To prevent
from abuses of the translated docs, JPUG thinks that JPUG needs to add
the copyright notice by JPUG to the Japanese translated docs (note
that JPUG is a registered non profit organization and can be a
copyright holder).

Opinions?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Joshua D. Drake

On 03/04/2016 04:41 PM, Robert Haas wrote:

As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.


Robert,

This does not sound like Bruce at all. Bruce is a lot of things, 
stubborn, sometimes temperamental, a lot of times like you... a hot head 
but he does not take credit for other people's work in my experience.



get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.


Agreed.


That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,


I don't see this Robert. I don't see some secret hidden plan. I don't 
see any cabal. I see a guy that has an idea, just like everyone else on 
this list.



as the guy who actually wrote the plan that EnterpriseDB is following,
I happen to know that there's nothing more to it than what I wrote
above.


Even if there was, so what? IF EDB wants to have a secret plan to push a 
lot of cool features to .Org, who cares? In the end, it all has to go 
through peer review and the meritocracy anyway.


Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas
On Tue, Mar 1, 2016 at 12:07 PM, Konstantin Knizhnik
 wrote:
> In the article them used anotion "wait":
>
> if T.SnapshotTime>GetClockTime()
> then wait until T.SnapshotTime
> Originally we really do sleep here, but then we think that instead of
> sleeping we can just adjust local time.
> Sorry, I do not have format prove it is equivalent but... at least we have
> not encountered any inconsistencies after this fix and performance is
> improved.

I think that those things are probably not equivalent.  They would be
if you could cause the adjustment to advance in lock-step on every
node at the same time, but you probably can't.  And I think it is
extremely unwise to assume that the fact that nothing obviously broke
means that you got it right.  This is the sort of work where formal
proofs of correctness are, IMHO, extremely wise.

> I fear that building a DTM that is fully reliable and also
> well-performing is going to be really hard, and I think it would be
> far better to have one such DTM that is 100% reliable than two or more
> implementations each of which are 99% reliable.
>
> The question is not about it's reliability, but mostly about its
> functionality and flexibility.

Well, *my* concern is about reliability.  A lot of code can be made
faster at the price of less reliability, but that usually doesn't work
out well in the end.  Performance matters too, of course, but the way
to get there is to start with a good algorithm, write reliable code to
implement it, and then optimize.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas
On Wed, Mar 2, 2016 at 1:53 PM, Josh berkus  wrote:
> One of the things which causes bad reactions and arguments, Bruce, is that a
> lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.

Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
beating this drum, and am frankly pretty annoyed about it.  In the
first place, he seems to think that he invented the idea of using FDWs
for sharding in PostgreSQL, but I don't think that's true.  I think it
was partly my idea, and partly something that the NTT folks have been
working on for years (cf, e.g.,
cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.  In the second place, the
only thing that these repeated emails and development meeting
discussions of the topic actually accomplish is to be piss people off.
I do believe that enhancing the foreign data wrapper interface can be
part of a horizontal scalability story for PostgreSQL, but as long as
nobody is objecting to the individual enhancements, which I don't see
anybody doing, then why the heck do we have to keep arguing about this
big picture story?  It doesn't matter at all, and it doesn't even
really exist, yet somehow Bruce keeps bringing it up, which I think
serves no useful purpose whatsoever.

> If we want to move forwards on serious work on FDW-based sharding, the folks
> working on it should stop treating it as a "fait accompli" that this is the
> Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all of your
> time arguing that point instead of working on features that matter.

The only person treating it that way is Bruce.

> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and decided
> that it should be the plan for the whole project.  I know that Bruce and
> others have good reasons for starting the FDW project, but there hasn't been
> much of an attempt to obtain community consensus around it. If Bruce and
> others want contributors to work on FDWs instead of other sharding
> approaches, then they need to win over those people as to why they should do
> that.  It's how this community works.

There hasn't been much of an attempt to obtain community consensus
about it because there isn't actually some grand plan, private or
otherwise, much as Bruce's emails might make you think otherwise.
EnterpriseDB *does* have a plan to try to continue enhancing foreign
data wrappers so that you can run queries against foreign tables and
get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.

The only other ongoing work that EnterpriseDB has that at all touches
on this area is Ashutosh Bapat's work on 2PC for FDWs.  I'm not
convinced that's fully baked, and it conflicts with the XTM stuff the
Postgres Pro guys are doing, which I *also* don't think is fully
baked, so I'm not real keen on pressing forward aggressively with
either approach right now.  I think we (eventually) need a solution to
the problem of consistent cross-node consistency, but I am deeply
unconvinced that anything currently on the table is going to get us
there.  I did recommend the 2PC for FDW project, but I'm not amazingly
happy with how it came out, and I think we need to think harder about
other approaches before adopting something.

> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done, FDW-based
> scaleout will be such an obvious solution that nobody will argue with it.

That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,
as the guy who actually wrote the plan that EnterpriseDB is 

Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 9:21 AM, Julien Rouhaud
 wrote:
> On 04/03/2016 23:34, Michael Paquier wrote:
>> New design discussions are a little bit late for 9.6 I am afraid :(
>> Perhaps we should consider this patch as returned with feedback for
>> the time being? The hook approach is not something I'd wish for if we
>> can improve in-core facility that would help user to decide better how
>> to tune autovacuum parameters.
>
> Yes, it's clearly not suited for the final commitfest. I just closed the
> patch as "returned with feedback".
>
> I'll work on the feedbacks I already had to document a wiki page, and
> wait for this commitfest to be more or less finished before starting a
> new thread on autovacuum instrumentation design.

OK, thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Julien Rouhaud
On 04/03/2016 23:34, Michael Paquier wrote:
> On Sat, Mar 5, 2016 at 6:52 AM, Julien Rouhaud
>  wrote:
>> Very good suggestion.
>>
>> I think the most productive way to work on this is to start a wiki page
>> to summarize what's the available information, what we should store and
>> how to represent it.
>>
>> I'll update this thread as soon as I'll have a first draft finished.
> 
> New design discussions are a little bit late for 9.6 I am afraid :(
> Perhaps we should consider this patch as returned with feedback for
> the time being? The hook approach is not something I'd wish for if we
> can improve in-core facility that would help user to decide better how
> to tune autovacuum parameters.

Yes, it's clearly not suited for the final commitfest. I just closed the
patch as "returned with feedback".

I'll work on the feedbacks I already had to document a wiki page, and
wait for this commitfest to be more or less finished before starting a
new thread on autovacuum instrumentation design.


> The VACUUM progress facility covers a
> different need by helping to track how long a scan is still going to
> take. What we want here is something that would run on top of that.
> Logs at least may be helpful for things like pgbadger.
> 



-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Andres Freund
On 2016-03-05 07:43:00 +0900, Michael Paquier wrote:
> On Sat, Mar 5, 2016 at 7:35 AM, Andres Freund  wrote:
> > On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
> >> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund  wrote:
> >> Hm. OK. I don't see any reason why switching to link() even in code
> >> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
> >> be a problem thinking about it. Should HAVE_WORKING_LINK be available
> >> on a platform we can combine it with unlink. Is that in line with what
> >> you think?
> >
> > I wasn't trying to suggest we should replace all rename codepaths with
> > the link wrapper, just the ones that already have a HAVE_WORKING_LINK
> > check. The name of the routine I suggested is bad though...
>
> So we'd introduce a first routine rename_or_link_safe(), say replace_safe().

Or actually maybe just link_safe(), which falls back to access() &&
rename() if !HAVE_WORKING_LINK.

> > That's one approach, yes. Combined with the fact that you can't actually
> > reliably rename across directories, the two could be on different
> > filesystems after all, that'd be a suitable defense. It just needs to be
> > properly documented in the function header, not at the bottom.
>
> OK. Got it. Or the two could be on the same filesystem.

> Still, link() and rename() do not support doing their stuff on
> different filesystems (EXDEV).

That's my point ...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 7:37 AM, Andres Freund  wrote:
> On 2016-03-05 07:29:35 +0900, Michael Paquier wrote:
>> OK. I could produce that by tonight my time, not before unfortunately.
>
> I'm switching to this patch, after pushing the pending logical decoding
> fixes. Probably not today, but tomorrow PST afternoon should work.

OK, so if that's the case there is not need to step on your toes seen from here.

>> And FWIW, per the comments of Andres, it is not clear to me what we
>> gain by having a common routine for link() and rename() knowing that
>> half the code paths performing a rename do not rely on link().
>
> I'm not talking about replacing all renames with this. Just the ones
> that currently use link(). There's not much point in introducing
> link_safe(), when all the callers have the same duplicated code, with a
> fallback to rename().

Indeed, that's the case. I don't have a better name than replace_safe
though. replace_paranoid is not a very appealing name either.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 7:35 AM, Andres Freund  wrote:
> On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
>> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund  wrote:
>> > I don't think we want any stat()s here. I'd much, much rather check open
>> > for ENOENT.
>>
>> OK. So you mean more or less that, right?
>> int fd;
>> fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0);
>> if (fd < 0)
>> {
>> if (errno != ENOENT)
>> return -1;
>> }
>> else
>> {
>> pg_fsync(fd);
>> CloseTransientFile(fd);
>> }
>
> Yes. Otherwise the check is racy: The file could be gone by the time you
> do the fsync; leading to a spurious ERROR (which often would get
> promoted to a PANIC).

Yeah, that makes sense.

>> >> +/*
>> >> + * link_safe -- make a file hard link, making it on-disk persistent
>> >> + *
>> >> + * This routine ensures that a hard link created on a file persists on 
>> >> system
>> >> + * in case of a crash by using fsync where on the link generated as well 
>> >> as on
>> >> + * its parent directory.
>> >> + */
>> >> +int
>> >> +link_safe(const char *oldfile, const char *newfile)
>> >> +{
>> >
>> > If we go for a new abstraction here, I'd rather make it
>> > 'replace_file_safe' or something, and move the link/rename code #ifdef
>> > into it.
>>
>> Hm. OK. I don't see any reason why switching to link() even in code
>> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
>> be a problem thinking about it. Should HAVE_WORKING_LINK be available
>> on a platform we can combine it with unlink. Is that in line with what
>> you think?
>
> I wasn't trying to suggest we should replace all rename codepaths with
> the link wrapper, just the ones that already have a HAVE_WORKING_LINK
> check. The name of the routine I suggested is bad though...

So we'd introduce a first routine rename_or_link_safe(), say replace_safe().

>> Do you suggest to correct this comment to remove the mention to the
>> old file's parent directory because we just care about having the new
>> file as being persistent?
>
> That's one approach, yes. Combined with the fact that you can't actually
> reliably rename across directories, the two could be on different
> filesystems after all, that'd be a suitable defense. It just needs to be
> properly documented in the function header, not at the bottom.

OK. Got it. Or the two could be on the same filesystem. Still, link()
and rename() do not support doing their stuff on different filesystems
(EXDEV).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Andres Freund
On 2016-03-05 07:29:35 +0900, Michael Paquier wrote:
> OK. I could produce that by tonight my time, not before unfortunately.

I'm switching to this patch, after pushing the pending logical decoding
fixes. Probably not today, but tomorrow PST afternoon should work.

> And FWIW, per the comments of Andres, it is not clear to me what we
> gain by having a common routine for link() and rename() knowing that
> half the code paths performing a rename do not rely on link().

I'm not talking about replacing all renames with this. Just the ones
that currently use link(). There's not much point in introducing
link_safe(), when all the callers have the same duplicated code, with a
fallback to rename().


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Andres Freund
On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund  wrote:
> > Hi,
> 
> Thanks for the review.
> 
> >> +/*
> >> + * rename_safe -- rename of a file, making it on-disk persistent
> >> + *
> >> + * This routine ensures that a rename file persists in case of a crash by 
> >> using
> >> + * fsync on the old and new files before and after performing the rename 
> >> so as
> >> + * this categorizes as an all-or-nothing operation.
> >> + */
> >> +int
> >> +rename_safe(const char *oldfile, const char *newfile)
> >> +{
> >> + struct stat filestats;
> >> +
> >> + /*
> >> +  * First fsync the old entry and new entry, it this one exists, to 
> >> ensure
> >> +  * that they are properly persistent on disk. Calling this routine 
> >> with
> >> +  * an existing new target file is fine, rename() will first remove it
> >> +  * before performing its operation.
> >> +  */
> >> + fsync_fname(oldfile, false);
> >> + if (stat(newfile, ) == 0)
> >> + fsync_fname(newfile, false);
> >
> > I don't think we want any stat()s here. I'd much, much rather check open
> > for ENOENT.
> 
> OK. So you mean more or less that, right?
> int fd;
> fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0);
> if (fd < 0)
> {
> if (errno != ENOENT)
> return -1;
> }
> else
> {
> pg_fsync(fd);
> CloseTransientFile(fd);
> }

Yes. Otherwise the check is racy: The file could be gone by the time you
do the fsync; leading to a spurious ERROR (which often would get
promoted to a PANIC).

> >> +/*
> >> + * link_safe -- make a file hard link, making it on-disk persistent
> >> + *
> >> + * This routine ensures that a hard link created on a file persists on 
> >> system
> >> + * in case of a crash by using fsync where on the link generated as well 
> >> as on
> >> + * its parent directory.
> >> + */
> >> +int
> >> +link_safe(const char *oldfile, const char *newfile)
> >> +{
> >
> > If we go for a new abstraction here, I'd rather make it
> > 'replace_file_safe' or something, and move the link/rename code #ifdef
> > into it.
> 
> Hm. OK. I don't see any reason why switching to link() even in code
> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
> be a problem thinking about it. Should HAVE_WORKING_LINK be available
> on a platform we can combine it with unlink. Is that in line with what
> you think?

I wasn't trying to suggest we should replace all rename codepaths with
the link wrapper, just the ones that already have a HAVE_WORKING_LINK
check. The name of the routine I suggested is bad though...

> >> + if (link(oldfile, newfile) < 0)
> >> + return -1;
> >> +
> >> + /*
> >> +  * Make the link persistent in case of an OS crash, the new entry
> >> +  * generated as well as its parent directory need to be flushed.
> >> +  */
> >> + fsync_fname(newfile, false);
> >> +
> >> + /*
> >> +  * Same for parent directory. This routine is never called to rename
> >> +  * files across directories, but if this proves to become the case,
> >> +  * flushing the parent directory if the old file would be necessary.
> >> +  */
> >> + fsync_parent_path(newfile);
> >> + return 0;
> >
> > I think it's a seriously bad idea to encode that knowledge in such a
> > general sounding routine.  We could however argue that this is about
> > safely replacing the *target* file; not about safely removing the old
> > file.
> 
> Not sure I am following here. Are you referring to the fact that if
> the new file and old file are on different directories would make this
> routine unreliable?

Yes.


> Because yes that's the case if we want to make both of them
> persistent, and I think we want to do so.

That's one way.


> Do you suggest to correct this comment to remove the mention to the
> old file's parent directory because we just care about having the new
> file as being persistent?

That's one approach, yes. Combined with the fact that you can't actually
reliably rename across directories, the two could be on different
filesystems after all, that'd be a suitable defense. It just needs to be
properly documented in the function header, not at the bottom.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 6:52 AM, Julien Rouhaud
 wrote:
> Very good suggestion.
>
> I think the most productive way to work on this is to start a wiki page
> to summarize what's the available information, what we should store and
> how to represent it.
>
> I'll update this thread as soon as I'll have a first draft finished.

New design discussions are a little bit late for 9.6 I am afraid :(
Perhaps we should consider this patch as returned with feedback for
the time being? The hook approach is not something I'd wish for if we
can improve in-core facility that would help user to decide better how
to tune autovacuum parameters. The VACUUM progress facility covers a
different need by helping to track how long a scan is still going to
take. What we want here is something that would run on top of that.
Logs at least may be helpful for things like pgbadger.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 1:23 AM, Robert Haas  wrote:
> On Fri, Mar 4, 2016 at 11:09 AM, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> I would like to have a patch for this finalized today, so that we can
>>> apply to master before or during the weekend; with it in the tree for
>>> about a week we can be more confident and backpatch close to next
>>> weekend, so that we see it in the next set of minor releases.  Does that
>>> sound good?
>>
>> I see no reason to wait before backpatching.  If you're concerned about
>> having testing, the more branches it is in, the more buildfarm cycles
>> you will get on it.  And we're not going to cut any releases in between,
>> so what's the benefit of not having it there?
>
> Agreed.

OK. I could produce that by tonight my time, not before unfortunately.
And FWIW, per the comments of Andres, it is not clear to me what we
gain by having a common routine for link() and rename() knowing that
half the code paths performing a rename do not rely on link(). At
least it sound dangerous to me to introduce a dependency to link() in
code paths that depend just on rename() for back branches. On HEAD, we
could be more adventurous for sure. Regarding the replacement of
stat() by something relying on OpenTransientFile I agree though. For
the flush of the parent directory in link_safe() we'd still want to do
it, and we are fine to not flush the parent directory of the old file
because the backend does not move files across paths.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 12:08 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> - 0001, as mentioned by Petr upthread, psed is removed from the core
>> distribution of Perl in 5.22, so when installing ActivePerl it is not
>> possible to create probes.h, and the code compilation would fail. I
>> bumped into that so here is a patch. What I am proposing here is to
>> replace psed by sed, sed being available in MSYS like bison and flex,
>> so when building using MSVC the environment to set up is normally
>> already good to go even with this additional dependency. Now, it is
>> important to mention that probes.h is not part of a source tarball. I
>> think that we would want probes.h to be part of a source tarball so as
>> it would be possible to compile the code on Windows using MSVC without
>> having to install MSYS. I haven't done that in this patch, thoughts on
>> the matter are welcome.
>
> I think the path of least resistance is to change the sed script into a
> Perl script.  Should be fairly simple ...

Yes that's possible as well. It would be better to use the same
process for *nix platforms as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Michael Paquier
On Sat, Mar 5, 2016 at 1:16 AM, Craig Ringer  wrote:
> On 5 March 2016 at 00:10, Alvaro Herrera  wrote:
>>
>> Craig Ringer wrote:
>>
>> > If it's the result of perltidy changing its mind about the formatting as
>> > a
>> > result of this change I guess we have to eyeroll and live with it.
>> > perltidy
>> > leaves the file alone as it is in the tree currently, so that be it.
>> >
>> > Gripe withdrawn, ready for committer IMO
>>
>> Okay, thanks.  I applied it back to 9.4, which is when
>> --enable-tap-tests appeared.  I didn't perltidy 9.4's config_default.pl,
>> though.
>
>
> Thanks very much. It didn't occur to me to backport it, but it seems
> harmless.
>
> https://commitfest.postgresql.org/9/566/ marked as committed.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Robert Haas
On Fri, Feb 26, 2016 at 3:28 AM,   wrote:
> Thank you for your comments.
> Please find attached patch addressing following comments.
>
>>As I might have written upthread, transferring the whole string
>>as a progress message is useless at least in this scenario. Since
>>they are a set of fixed messages, each of them can be represented
>>by an identifier, an integer number. I don't see a reason for
>>sending the whole of a string beyond a backend.
> Agreed. I used following macros.
> #define VACUUM_PHASE_SCAN_HEAP  1
> #define VACUUM_PHASE_VACUUM_INDEX_HEAP  2
>
>>I guess num_index_scans could better be reported after all the indexes are
>>done, that is, after the for loop ends.
> Agreed.  I have corrected it.
>
>> CREATE VIEW pg_stat_vacuum_progress AS
>>   SELECT S.s[1] as pid,
>>  S.s[2] as relid,
>>  CASE S.s[3]
>>WHEN 1 THEN 'Scanning Heap'
>>WHEN 2 THEN 'Vacuuming Index and Heap'
>>ELSE 'Unknown phase'
>>  END,
>>
>>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
>>
>> # The name of the function could be other than *_command_progress.
> The name of function is updated as pg_stat_get_progress_info() and also 
> updated the function.
> Updated the pg_stat_vacuum_progress view as suggested.

I'm positive I've said this at least once before while reviewing this
patch, and I think more than once: we should be trying to build a
general progress-reporting facility here with vacuum as the first
user.  Therefore, for example, pg_stat_get_progress_info's output
columns should have generic names, not names specific to VACUUM.
pg_stat_vacuum_progress can alias them to a vacuum-specific name.  See
for example the relationship between pg_stats and pg_statistic.

I think VACUUM should have three phases, not two.  lazy_vacuum_index()
and lazy_vacuum_heap() are lumped together right now, but I think they
shouldn't be.

Please create named constants for the first argument to
pgstat_report_progress_update_counter(), maybe with names like
PROGRESS_VACUUM_WHATEVER.

+   /* Update current block number of the relation */
+   pgstat_report_progress_update_counter(2, blkno + 1);

Why + 1?

I thought we had a plan to update the counter of scanned index pages
after each index page was vacuumed by the AM.  Doing it only after
vacuuming the entire index is much less granular and generally less
useful.   See 
http://www.postgresql.org/message-id/56500356.4070...@bluetreble.com

+   if (blkno == nblocks - 1 &&
vacrelstats->num_dead_tuples == 0 && nindexes != 0
+   && vacrelstats->num_index_scans == 0)
+   total_index_pages = 0;

I'm not sure what this is trying to do, perhaps because there is no
comment explaining it.  Whatever the intent, I suspect that such a
complex test is likely to be fragile.  Perhaps there is a better way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Julien Rouhaud
On 03/03/2016 10:54, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Wed, 2 Mar 2016 17:48:06 -0600, Jim Nasby  wrote 
> in <56d77bb6.6080...@bluetreble.com>
>> On 3/2/16 10:48 AM, Julien Rouhaud wrote:
>>> Good point, I don't see a lot of information available with this hooks
>>> that a native system statistics couldn't offer. To have the same
>>> amount
>>> of information, I think we'd need a pg_stat_autovacuum view that shows
>>> a
>>> realtime insight of the workers, and also add some aggregated counters
>>> to PgStat_StatTabEntry. I wonder if adding counters to
>>> PgStat_StatTabEntry would be accepted though.
>>
>> I would also really like to see a means of logging (auto)vacuum
>> activity in the database itself. We figured out how to do that with
>> pg_stat_statements, which was a lot harder... it seems kinda silly not
>> to offer that for vacuum. Hooks plus shared memory data should allow
>> for that (the only tricky bit is the hook would need to start and then
>> commit a transaction, but that doesn't seem onerous).
>>
>> I think the shared memory structures should be done as well. Having
>> that real-time info is also valuable.
>>
>> I don't see too much point in adding stuff to the stats system for
>> this.
> 
> I wonder why there haven't been discussions so far on what kind
> of information we want by this feature. For example I'd be happy
> to see the time of last autovacuum trial and the cause if it has
> been skipped for every table. Such information would (maybe)
> naturally be shown in pg_stat_*_tables.
> 
> =
> =# select relid, last_completed_autovacuum, last_completed_autovacv_status, 
> last_autovacuum_trial, last_autovacuum_result from pg_stat_user_tables;
> -[ RECORD 1 ]-+--
> relid | 16390
> last_completed_autovacuum | 2016-03-01 01:25:00.349074+09
> last_completed_autovac_status | Completed in 4 seconds. Scanned 434 pages, 
> skipped 23 pages
> last_autovacuum_trial | 2016-03-03 17:33:04.004322+09
> last_autovac_traial_status| Canceled by PID 2355. Processed 144/553 pages.
> -[ RECORD 2 ]--+--
> ...
> last_autovacuum_trial | 2016-03-03 07:25:00.349074+09
> last_autovac_traial_status| Completed in 4 seconds. Scanned 434 pages, 
> skipped 23 pages
> -[ RECORD 3 ]--+--
> ...
> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status | Processing by PID 42334, 564 / 32526 pages 
> done.
> -[ RECORD 4 ]--+--
> ...
> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status | Skipped by dead-tuple threashold.
> =
> 
> Apart from the appropriateness of the concrete shape, it would be
> done by extending the current stats system and needs modification
> of some other parts but the hooks and WorkerInfoData is not
> enough. This might be a business of Rahila's "VACUUM Progress
> Checker" and it convers some real-time info.
> 
> https://commitfest.postgresql.org/9/545/
> 
> On the other hand, it would be in another place and needs another
> method if we want a history like the current autovacuum
> completion logs (at debug3..) of 100 latest invocation of
> autovacuum worker. Anyway the WorkerInfoData is not enough.
> 
> 
> What kind of information we (will) want to have?
> 

Very good suggestion.

I think the most productive way to work on this is to start a wiki page
to summarize what's the available information, what we should store and
how to represent it.

I'll update this thread as soon as I'll have a first draft finished.

> 
> regards,
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-04 Thread Alvaro Herrera
I wonder why do we have two identical copies of clause_sides_match_join ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane  wrote:
>> Huh?  Parallel workers are read-only; what would they be doing sending
>> any of these messages?

> Mumble.  I have no idea what's happening here.

OK, after inserting a bunch of debug logging I have figured out what is
happening.  The updates on trunc_stats_test et al, being updates, are
done in the session's main backend.  But we also have these queries:

-- do a seqscan
SELECT count(*) FROM tenk2;
-- do an indexscan
SELECT count(*) FROM tenk2 WHERE unique1 = 1;

These can be, and are, done in parallel worker processes (and not
necessarily the same one, either).  AFAICT, the parallel worker
processes send their stats messages to the stats collector more or
less immediately after processing their queries.  However, because
of the rate-limiting logic in pgstat_report_stat, the main backend
doesn't.  The point of that "pg_sleep(1.0)" (which was actually added
*after* wait_for_stats) is to ensure that the half-second delay in
the rate limiter has been soaked up, and the stats messages sent,
before we start waiting for the results to become visible in the
stats collector's output.

So the sequence of events when we get a failure looks like

1. parallel workers send stats updates for seqscan and indexscan
on tenk2.

2. stats collector emits output files, probably as a result of
an autovacuum request.

3. session's main backend finishes "pg_sleep(1.0)" and sends
stats updates for what it's done lately, including the
updates on trunc_stats_test et al.

4. wait_for_stats() observes that the tenk2 idx_scan count has
already advanced and figures it need not wait at all.

5. We print stale stats for trunc_stats_test et al.

So it appears to me that to make this robust, we need to adjust
wait_for_stats to verify advances on *all three of* the tenk2
seq_scan count, the tenk2 idx_scan count, and at least one of
the trunc_stats_test tables' counters, because those could be
coming from three different backend processes.

If we ever allow parallel workers to do writes, this will really
become a mess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-03-04 Thread Robert Haas
On Mon, Feb 29, 2016 at 10:02 PM, Stephen Frost  wrote:
> Attached is a stripped-down version of the default roles patch which
> includes only the 'pg_signal_backend' default role.  This provides the
> framework and structure for other default roles to be added and formally
> reserves the 'pg_' role namespace.  This is split into two patches, the
> first to formally reserve 'pg_', the second to add the new default role.
>
> Will add to the March commitfest for review.

Here is a review of the first patch:

+   if (!IsA(node, RoleSpec))
+   elog(ERROR, "invalid node type %d", node->type);

That looks strange.  Why not just Assert(IsA(node, RoleSpec))?

+
+   return;

Useless return.

@@ -673,6 +673,7 @@ dumpRoles(PGconn *conn)
 "pg_catalog.shobj_description(oid,
'pg_authid') as rolcomment, "
  "rolname =
current_user AS is_current_user "
  "FROM pg_authid "
+ "WHERE rolname !~ '^pg_' "
  "ORDER BY 2");
else if (server_version >= 90100)
printfPQExpBuffer(buf,
@@ -895,6 +896,7 @@ dumpRoleMembership(PGconn *conn)
   "LEFT JOIN pg_authid ur on
ur.oid = a.roleid "
   "LEFT JOIN pg_authid um on
um.oid = a.member "
   "LEFT JOIN pg_authid ug on
ug.oid = a.grantor "
+  "WHERE NOT (ur.rolname ~
'^pg_' AND um.rolname ~ '^pg_')"
   "ORDER BY 1,2,3");

If I'm reading this correctly, when dumping a 9.5 server, we'll
silently drop any roles whose names start with pg_ from the dump, and
hope that doesn't break anything.  When dumping a 9.4 or older server,
we'll include those roles and hope that they miraculously restore on
the new server.  I'm thinking neither of those approaches is going to
work out, and the difference between them seems totally unprincipled.

@@ -631,7 +637,8 @@ check_is_install_user(ClusterInfo *cluster)
res = executeQueryOrDie(conn,
"SELECT rolsuper, oid "
"FROM
pg_catalog.pg_roles "
-   "WHERE rolname
= current_user");
+   "WHERE rolname
= current_user "
+   "AND rolname
!~ '^pg_'");

/*
 * We only allow the install user in the new cluster (see comment below)
@@ -647,7 +654,8 @@ check_is_install_user(ClusterInfo *cluster)

res = executeQueryOrDie(conn,
"SELECT COUNT(*) "
-   "FROM
pg_catalog.pg_roles ");
+   "FROM
pg_catalog.pg_roles "
+   "WHERE rolname
!~ '^pg_'");

if (PQntuples(res) != 1)
pg_fatal("could not determine the number of users\n");

What bad thing would happen without these changes that would be
avoided with these changes?  I can't think of anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relaxing SSL key permission checks

2016-03-04 Thread Alvaro Herrera
Christoph Berg wrote:
> Re: To Tom Lane 2016-02-19 <20160219115334.gb26...@msg.df7cb.de>
> > Updated patch attached.
> 
> *Blush* I though I had compile-tested the patch, but without
> --enable-openssl it wasn't built :(.
> 
> The attached patch has successfully been included in the 9.6 Debian
> package, passed the regression tests there, and I've also done some
> chmod/chown tests on the filesystem to verify it indeed catches the
> cases laid out by Tom.

This looks like a pretty reasonable change to me.

For the sake of cleanliness, I propose splitting out the checks for
regular file and for owned-by-root-or-us from the actual chmod-level
checks at the same time.  That way we can provide more specific error
messages for each case.  (Furthermore, I'm pretty sure that the check
for superuserness could be applied on Windows also -- in the attached
patch it's still #ifdef'ed out, because I don't know how to write it.)

After doing that change I started to look at the details of the check
and found some mistakes:

* it said "g=w" instead of "g=r", in contradiction with the numeric
specification: g=w means 020 rather than 040.  We want the file to be
group-readable, not group-writeable.

* it failed to check for S_IXUSR, so permissions 0700 were okay, in
contradiction with what the error message indicates.  This is a
preexisting bug actually.  Do we want to fix it by preventing a
user-executable file (possibly breaking compability with existing
executable key files), or do we want to document what the restriction
really is?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e3dfb6..1330845 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -206,8 +206,30 @@ be_tls_init(void)
 	 errmsg("could not access private key file \"%s\": %m",
 			ssl_key_file)));
 
+		if (!S_ISREG(buf.st_mode))
+			ereport(FATAL,
+	(errcode(ERRCODE_CONFIG_FILE_ERROR),
+	 errmsg("private key file \"%s\" is not a regular file",
+			ssl_key_file)));
+
 		/*
-		 * Require no public access to key file.
+		 * Refuse to load files owned by users other than us or root.
+		 *
+		 * XXX surely we can check this on Windows somehow, too.
+		 */
+#if !defined(WIN32) && !defined(__CYGWIN__)
+		if (buf.st_uid != geteuid() && buf.st_uid != 0)
+			ereport(FATAL,
+	(errcode(ERRCODE_CONFIG_FILE_ERROR),
+	 errmsg("private key file \"%s\" must be owned by the database user or root",
+			ssl_key_file)));
+#endif
+
+		/*
+		 * Require no public access to key file. If the file is owned by us,
+		 * require mode 0600 or less. If owned by root, require 0640 or less
+		 * to allow read access through our gid, or a supplementary gid that
+		 * allows to read system-wide certificates.
 		 *
 		 * XXX temporarily suppress check when on Windows, because there may
 		 * not be proper support for Unix-y file permissions.  Need to think
@@ -215,12 +237,13 @@ be_tls_init(void)
 		 * directory permission check in postmaster.c)
 		 */
 #if !defined(WIN32) && !defined(__CYGWIN__)
-		if (!S_ISREG(buf.st_mode) || buf.st_mode & (S_IRWXG | S_IRWXO))
+		if ((buf.st_uid == geteuid() && buf.st_mode & (S_IXUSR | S_IRWXG | S_IRWXO)) ||
+			(buf.st_uid == 0 && buf.st_mode & (S_IXUSR | S_IWGRP | S_IXGRP | S_IRWXO)))
 			ereport(FATAL,
 	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-  errmsg("private key file \"%s\" has group or world access",
-		 ssl_key_file),
-   errdetail("Permissions should be u=rw (0600) or less.")));
+	 errmsg("private key file \"%s\" has group or world access",
+			ssl_key_file),
+	 errdetail("File must have permissions u=rw (0600) or less if owned by the datbase user, or permissions u=rw,g=r (0640) or less if owned by root.")));
 #endif
 
 		if (SSL_CTX_use_PrivateKey_file(SSL_context,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 2:39 PM, Thomas Munro
 wrote:
> Here is a patch to fix a typo in a comment in timestamp.c.

That looks like a typo, all right.  Committed.

(It's "commit small patches day" for me today, in case anybody hasn't
caught on to that already...)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-04 Thread Robert Haas
On Tue, Feb 23, 2016 at 1:18 AM, Etsuro Fujita
 wrote:
> Thanks again for updating the patch and fixing the issues!

Some comments on the latest version.  I haven't reviewed the
postgres_fdw changes in detail here, so this is just about the core
changes.

I see that show_plan_tlist checks whether the operation is any of
CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
else where a similar test is needed instead tests whether the
operation is *not* CMD_SELECT.  I think this place should do it that
way, too.

+   resultRelInfo = mtstate->resultRelInfo;
for (i = 0; i < nplans; i++)
{
ExecAuxRowMark *aerm;

+   /*
+* ignore subplan if the FDW pushes down the
command to the remote
+* server; the ModifyTable won't have anything
to do except for
+* evaluation of RETURNING expressions
+*/
+   if (resultRelInfo->ri_FdwPushdown)
+   {
+   resultRelInfo++;
+   continue;
+   }
+
subplan = mtstate->mt_plans[i]->plan;
aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
mtstate->mt_arowmarks[i] =
lappend(mtstate->mt_arowmarks[i], aerm);
+   resultRelInfo++;
}


This kind of thing creates a hazard for future people maintaining this
code.  If somebody adds some code to this loop that needs to execute
even when resultRelInfo->ri_FdwPushdown is true, they have to add two
copies of it.  It's much better to move the three lines of logic that
execute only in the non-pushdown case inside of if
(!resultRelInfo->ri_FdwPushdown).

This issue crops up elsewhere as well.  The changes to
ExecModifyTable() have the same problem -- in that case, it might be
wise to move the code that's going to have to be indented yet another
level into a separate function.   That code also says this:

+   /* No need to provide scan tuple to
ExecProcessReturning. */
+   slot = ExecProcessReturning(resultRelInfo,
NULL, planSlot);

...but, uh, why not?  The comment says what the code does, but what it
should do is explain why it does it.

On a broader level, I'm not very happy with the naming this patch
uses.  Here's an example:

+
+ If an FDW supports optimizing foreign table updates, it still needs to
+ provide PlanDMLPushdown, BeginDMLPushdown,
+ IterateDMLPushdown and EndDMLPushdown
+ described below.
+

"Optimizing foreign table updates" is both inaccurate (since it
doesn't only optimize updates) and so vague as to be meaningless
unless you already know what it means.  The actual patch uses
terminology like "fdwPushdowns" which is just as bad.  We might push a
lot of things to the foreign side -- sorts, joins, aggregates, limits
-- and this is just one of them.  Worse, "pushdown" is itself
something of a term of art - will people who haven't been following
all of the mammoth, multi-hundred-email threads on this topic know
what that means?  I think we need some better terminology here.

The best thing that I can come up with offhand is "bulk modify".  So
we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
ResultRelInfo flag could be ri_usesFDWBulkModify.  The documentation
could say something like this:

Some inserts, updates, and deletes to foreign tables can be optimized
by implementing an alternate set of interfaces.  The ordinary
interfaces for inserts, updates, and deletes fetch rows from the
remote server and then modify those rows one at a time.  In some
cases, this row-by-row approach is necessary, but it can be
inefficient.  If it is possible for the foreign server to determine
which rows should be modified without actually retrieving them, and if
there are no local triggers which would affect the operation, then it
is possible to arrange things so that the entire operation is
performed on the remote server.  The interfaces described below make
this possible.

+ Begin executing a foreign table update directly on the remote server.

I think this should say "Prepare to execute a bulk modification
directly on the remote server".  It shouldn't actually begin the
execution phase.

+ End the table update and release resources.  It is normally not important

And I think this one should say "Clean up following a bulk
modification on the remote server".  It's not actually ending the
update; the iterate method already did that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-04 Thread Corey Huinker
>
>
> I feel rather uneasy about simply removing the 'infinity' checks. Is there
> a way to differentiate those two cases, i.e. when the generate_series is
> called in target list and in the FROM part? If yes, we could do the check
> only in the FROM part, which is the case that does not work (and consumes
> arbitrary amounts of memory).
>
>
It would be simple enough to remove the infinity test on the "stop" and
leave it on the "start". Or yank both. Just waiting for others to agree
which checks should remain.


Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-04 Thread Vitaly Burovoy
On 3/4/16, Anastasia Lubennikova  wrote:
> 27.02.2016 09:57, Vitaly Burovoy:
>> Hello, Hackers!
>>
>> I worked on a patch[1] allows "EXTRACT(epoch FROM
>> +-Inf::timestamp[tz])" to return "+-Inf::float8".
>> There is an opposite function "to_timestamp(float8)" which now defined
>> as:
>> SELECT ('epoch'::timestamptz + $1 * '1 second'::interval)
>
> Hi,
> thank you for the patches.

Thank you for the review.

> Could you explain, whether they depend on each other?

Only logically. They reverse each other:
postgres=# SELECT v, to_timestamp(v), extract(epoch FROM to_timestamp(v)) FROM
postgres-#   unnest(ARRAY['+inf', '-inf', 0, 65536, 982384720.12]::float8[]) v;
  v   |   to_timestamp|  date_part
--+---+--
 Infinity | infinity  | Infinity
-Infinity | -infinity |-Infinity
0 | 1970-01-01 00:00:00+00|0
65536 | 1970-01-01 18:12:16+00|65536
 982384720.12 | 2001-02-17 04:38:40.12+00 | 982384720.12
(5 rows)


>> Since intervals do not support infinity values, it is impossible to do
>> something like:
>>
>> SELECT to_timestamp('infinity'::float8);
>>
>> ... which is not good.
>>
>> Supporting of such converting is in the TODO list[2] (by "converting
>> between infinity timestamp and float8").
>
> You mention intervals here, and TODO item definitely says about
> 'infinity' interval,

Yes, it is in the same block. But I wanted to point to the link
"converting between infinity timestamp and float8". There are two-way
conversion examples.

> while patch and all the following discussion concerns to timestamps.
> Is it a typo or I misunderstood something important?

It is just a reason why I rewrote it as an internal function.
I asked whether to just rewrite the function
"pg_catalog.to_timestamp(float8)" as an internal one or to add support
of infinite intervals. Tom Lane answered[5] "you should stay away from
infinite intervals".
So I left intervals as is.

> I assumed that following query will work, but it isn't. Could you
> clarify that?
> select to_timestamp('infinity'::interval);

It is not hard. There is no logical way to convert interval (e.g.
"5minutes") to a timestamp (or date).
There never was a function "to_timestamp(interval)" and never will be.
postgres=# select to_timestamp('5min'::interval);
ERROR:  function to_timestamp(interval) does not exist
LINE 1: select to_timestamp('1min'::interval);
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

>> Proposed patch implements it.
>>
>> There is an other patch in the CF[3] 2016-03 implements checking of
>> timestamp[tz] for being in allowed range. Since it is wise to set
>> (fix) the upper boundary of timestamp[tz]s, I've included the file
>> "src/include/datatype/timestamp.h" from there to check that an input
>> value and a result are in the allowed range.
>>
>> There is no changes in a documentation because allowed range is the
>> same as officially supported[4] (i.e. until 294277 AD).
>
> I think that you should update documentation. At least description of
> epoch on this page:
> http://www.postgresql.org/docs/devel/static/functions-datetime.html

Thank you very much for pointing where it is located (I saw only
"to_timestamp(TEXT, TEXT)").
I'll think how to update it.

> More thoughts about the patch:
>
> 1. When I copy value from hints for min and max values (see examples
> below), it works fine for min, while max still leads to error.
> It comes from the check   "if (seconds >= epoch_ubound)". I wonder,
> whether you should change hint message?
>
> select to_timestamp(-210866803200.00);
>to_timestamp
> -
>   4714-11-24 02:30:17+02:30:17 BC
> (1 row)
>
>
> select to_timestamp(9224318016000.00);
> ERROR:  UNIX epoch out of range: "9224318016000.00"
> HINT:  Maximal UNIX epoch value is "9224318016000.00"

I agree, it is a little confusing. Do you (or anyone) know how to
construct a condensed phrase that it is an exclusive upper bound of an
allowed UNIX epoch range?

> 2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h:
>
>   * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy
>   * about the maximum, since it's far enough out to not be especially
>   * interesting.

It is just about the accuracy to the day for a lower bound, and to the
year (not to a day) for an upper bound.

> Maybe you can expand it?
> - Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases?
> - Why do we need to hold both definitions? I suppose, it's a matter of
> backward compatibility, isn't it?

Yes. I tried to be less invasive from the point of view of endusers.
They can be sure if they follow the documentation they won't get into
trouble.

> 3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice
> 

[HACKERS] Typo in comment

2016-03-04 Thread Thomas Munro
Hi

Here is a patch to fix a typo in a comment in timestamp.c.

-- 
Thomas Munro
http://www.enterprisedb.com


typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-03-04 Thread Andres Freund
On 2016-02-12 15:56:45 +0100, Andres Freund wrote:
> Hi,
> 
> 
> On 2016-02-10 23:26:20 -0500, Robert Haas wrote:
> > I think the part about whacking around the FDW API is a little more
> > potentially objectionable to others, so I want to hold off doing that
> > unless a few more people chime in with +1.  Perhaps we could start a
> > new thread to talk about that specific idea.  This is useful even
> > without that, though.
> 
> FWIW, I can delete a couple hundred lines of code from citusdb thanks to
> this...

And I'm now working on doing that.


> why exactly did you expose read/writeBitmapset(), and nothing else?
> Afaics a lot of the other read routines are also pretty necessary from
> the outside?

I'd like to also expose at least outDatum()/readDatum() - they're not
entirely trivial, so it'd be sad to copy them.


What I'm wondering about right now is how an extensible node should
implement the equivalent of
#define WRITE_NODE_FIELD(fldname) \
(appendStringInfo(str, " :" CppAsString(fldname) " "), \
 _outNode(str, node->fldname))

given that _outNode isn't public, that seems to imply having to do
something like

#define WRITE_NODE_FIELD(fldname) \
(appendStringInfo(str, " :" CppAsString(fldname) " "), \
 appendStringInfo(str, nodeToString(node->fldname)))

i.e. essentially doubling memory overhead. Istm we should make
outNode() externally visible?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transam README small fix

2016-03-04 Thread Stas Kelvich
Thanks.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


> On 04 Mar 2016, at 22:14, Robert Haas  wrote:
> 
> On Tue, Mar 1, 2016 at 4:31 AM, Stas Kelvich  wrote:
>> Transaction function call sequence description in transam/README is slightly 
>> outdated. Select now handled by PortalRunSelect instead of ProcessQuery. It 
>> is also hard to follow what tabulation there means — sometimes that means 
>> “function called by function”, sometimes it isn't. So I’ve also changed it 
>> to actual call nesting.
> 
> After some study, this looks good to me, so committed.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transam README small fix

2016-03-04 Thread Robert Haas
On Tue, Mar 1, 2016 at 4:31 AM, Stas Kelvich  wrote:
> Transaction function call sequence description in transam/README is slightly 
> outdated. Select now handled by PortalRunSelect instead of ProcessQuery. It 
> is also hard to follow what tabulation there means — sometimes that means 
> “function called by function”, sometimes it isn't. So I’ve also changed it to 
> actual call nesting.

After some study, this looks good to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Add generate_series(date, date) and generate_series(date, date, integer)

2016-03-04 Thread David Steele
On 2/21/16 2:24 PM, Vik Fearing wrote:
> On 02/21/2016 07:56 PM, Corey Huinker wrote:
>>
>> Other than that, the only difference is the ::date part. Is it
>> really worth adding extra code just for that? I would say not.
>>
>>
>> I would argue it belongs for the sake of completeness. 
> 
> So would I.
> 
> +1 for adding this missing function.

+1. FWIW, a sample query I wrote for a customer yesterday would have
looked nicer with this function. Here's how the generate_series looked:

generate_series('2016-03-01'::date, '2016-03-31'::date, interval '1
day')::date

But it would have been cleaner to write:

generate_series('2016-03-01'::date, '2016-03-31'::date)

More importantly, though, I don't like that the timestamp version of the
function happily takes date parameters but returns timestamps. I feel
this could lead to some subtle bugs for the unwary.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Fabien COELHO



You're probably right, but TBH I'm pretty unsure about this whole thing.


If the question is "is there a bug", then answer is yes. The progress report
may disappear if thread 0 happens to stop, even of all other threads go on.
Obviously it only concerns slow queries, but there is no reason why pgbench
should not work with slow queries. I can imagin good reason to do that, say
to check the impact of such queries on an OLTP load.

The bug can be kept instead, and it can be called a feature.


No, I agree that this looks like a bug and that we should fix it; for
example, if all connections from thread 0 terminate for some reason,
there will be no more reports, even if the other threads continue.
That's bad too.

What I'm unsure about is the proposed fix.


I will leave it alone for the time being.


Maybe you could consider pushing the first part of the patch, which stops if
a transaction is scheduled after the end of the run? Or is this part
bothering you as well?


So there are *two* bugs here?


Hmmm... AFAICR, maybe fixing the first creates the second issue, i.e. 
maybe the second issue is currently hidden by the thread going on after 
the end of the run, so the second is just a latent bug that cannot be 
encountered.


I'm not sure whether I'm very clear:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logic problem in SerializeSnapshot()

2016-03-04 Thread Robert Haas
On Tue, Mar 1, 2016 at 12:34 AM, Rushabh Lathia
 wrote:
> During the testing of parallel query (with force_parallel_mode = regress),
> noticed random server crash with the below stack:
>
> #0  0x003fc84896d5 in memcpy () from /lib64/libc.so.6
> #1  0x00a36867 in SerializeSnapshot (snapshot=0x1e49f40,
> start_address=0x7f391e9ec728 ) at
> snapmgr.c:1523
> #2  0x00522a20 in InitializeParallelDSM (pcxt=0x1e49ce0) at
> parallel.c:330
> #3  0x006dd256 in ExecInitParallelPlan (planstate=0x1f012b0,
> estate=0x1f00be8, nworkers=1) at execParallel.c:398
> #4  0x006f8abb in ExecGather (node=0x1f00d00) at nodeGather.c:160
> #5  0x006de42e in ExecProcNode (node=0x1f00d00) at
> execProcnode.c:516
> #6  0x006da4fd in ExecutePlan (estate=0x1f00be8,
> planstate=0x1f00d00, use_parallel_mode=1 '\001', operation=CMD_SELECT,
> sendTuples=1 '\001', numberTuples=0,
> direction=ForwardScanDirection, dest=0x1e5e118) at execMain.c:1633
>
> So started looking into SerializeSnapshot() and with code reading I found
> that
> we ignore copying the SubXID array if it has overflowed, unless the snapshot
> was taken during recovey, and for this we mark the
> serialized_snapshot->subxcnt
> to 0. But later while copying the SubXID array we check then condition based
> on
> snapshot->subxcnt. We should check serialized_snapshot->subxcnt rather then
> snapshot->subxcnt.
>
> I tried hard to come up with individual test but somehow I was unable to
> create testcase.
>
> PFA patch to fix the issue.

Thanks, that looks right to me.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Fabien COELHO



That is why the "fs" variable in process_file is declared "static", and why
I wrote "some hidden awkwarness".

I did want to avoid a malloc because then who would free the struct?
addScript cannot to it systematically because builtins are static. Or it
would have to create an on purpose struct, but I then that would be more
awkwarness, and malloc/free to pass arguments between functions is not
efficient nor very elegant.

So the "static" option looked like the simplest & most elegant version.


Surely that trick breaks if you have more than one -f switch, no?  Oh, I
see what you're doing: you only use the command list, which is
allocated, so it doesn't matter that the rest of the struct changes
later.


The two fields that matter (desc and commands) are really copied into 
sql_scripts, so what stays in the is overriden if used another time.



I'm not concerned about freeing the struct; what's the problem with it
surviving until the program terminates?


It is not referenced anywhere so it is a memory leak.

If somebody specifies thousands of -f switches, they will waste a few 
bytes with each, but I'm hardly concerned about a few dozen kilobytes 
there ...


Ok, so you prefer a memory leak. I hate it on principle.

Here is a v23 with a memory leak anyway.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..dd3fb1d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 Unambiguous prefixes of builtin names are accepted.
@@ -322,12 +324,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -687,9 +691,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1194,12 +1202,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: builtin: TPC-B (sort of)
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..5363648 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -300,23 +303,27 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
-	char	   *commands;		/* actual pgbench script */
-}
+	char	   *script;			/* actual pgbench script */
+	Command   **commands; 		/* temporary intermediate holder */
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Alvaro Herrera
Fabien COELHO wrote:
> 
> >>Probably it is possible, but it will sure need more that one little
> >>condition to be achieved... I do not think that introducing a non trivial
> >>distributed election algorithm involving locks and so would be a good
> >>decision for this very little matter.
> >>
> >>My advice is "keep it simple".
> >>
> >>If this is a blocker, I can sure write such an algorithm, when I have some
> >>spare time, but I'm not sure that the purpose is worth it.
> >
> >You're probably right, but TBH I'm pretty unsure about this whole thing.
> 
> If the question is "is there a bug", then answer is yes. The progress report
> may disappear if thread 0 happens to stop, even of all other threads go on.
> Obviously it only concerns slow queries, but there is no reason why pgbench
> should not work with slow queries. I can imagin good reason to do that, say
> to check the impact of such queries on an OLTP load.
> 
> The bug can be kept instead, and it can be called a feature.

No, I agree that this looks like a bug and that we should fix it; for
example, if all connections from thread 0 terminate for some reason,
there will be no more reports, even if the other threads continue.
That's bad too.

What I'm unsure about is the proposed fix.

> >I will leave it alone for the time being.
> 
> Maybe you could consider pushing the first part of the patch, which stops if
> a transaction is scheduled after the end of the run? Or is this part
> bothering you as well?

So there are *two* bugs here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Fabien COELHO



Probably it is possible, but it will sure need more that one little
condition to be achieved... I do not think that introducing a non trivial
distributed election algorithm involving locks and so would be a good
decision for this very little matter.

My advice is "keep it simple".

If this is a blocker, I can sure write such an algorithm, when I have some
spare time, but I'm not sure that the purpose is worth it.


You're probably right, but TBH I'm pretty unsure about this whole thing.


If the question is "is there a bug", then answer is yes. The progress 
report may disappear if thread 0 happens to stop, even of all other 
threads go on. Obviously it only concerns slow queries, but there is no 
reason why pgbench should not work with slow queries. I can imagin good 
reason to do that, say to check the impact of such queries on an OLTP 
load.


The bug can be kept instead, and it can be called a feature.


I will leave it alone for the time being.


Maybe you could consider pushing the first part of the patch, which stops 
if a transaction is scheduled after the end of the run? Or is this part 
bothering you as well?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Alvaro Herrera
Fabien COELHO wrote:

> >However, this is still a bit broken -- you cannot return a stack
> >variable from process_file, because the stack goes away once the
> >function returns.  You need to malloc it.
> 
> That is why the "fs" variable in process_file is declared "static", and why
> I wrote "some hidden awkwarness".
> 
> I did want to avoid a malloc because then who would free the struct?
> addScript cannot to it systematically because builtins are static. Or it
> would have to create an on purpose struct, but I then that would be more
> awkwarness, and malloc/free to pass arguments between functions is not
> efficient nor very elegant.
> 
> So the "static" option looked like the simplest & most elegant version.

Surely that trick breaks if you have more than one -f switch, no?  Oh, I
see what you're doing: you only use the command list, which is
allocated, so it doesn't matter that the rest of the struct changes
later.  That seems rather nasty to me -- I'd avoid that.

I'm not concerned about freeing the struct; what's the problem with it
surviving until the program terminates?  If somebody specifies thousands
of -f switches, they will waste a few bytes with each, but I'm hardly
concerned about a few dozen kilobytes there ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Fabien COELHO



  *-21.patch does what you suggested above, some hidden awkwardness
 but much less that the previous one.


Yeah, I think this is much nicer, don't you agree?


Yep, I said "less awkwarness than previous", a pretty contrived way to say 
better:-)



However, this is still a bit broken -- you cannot return a stack
variable from process_file, because the stack goes away once the
function returns.  You need to malloc it.


That is why the "fs" variable in process_file is declared "static", and 
why I wrote "some hidden awkwarness".


I did want to avoid a malloc because then who would free the struct? 
addScript cannot to it systematically because builtins are static. Or it 
would have to create an on purpose struct, but I then that would be more 
awkwarness, and malloc/free to pass arguments between functions is not 
efficient nor very elegant.


So the "static" option looked like the simplest & most elegant version.


Also, you forgot to update the comments in process_file,
process_builtin, etc.


Indeed. v22 attached with better comments.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..dd3fb1d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 Unambiguous prefixes of builtin names are accepted.
@@ -322,12 +324,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -687,9 +691,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1194,12 +1202,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: builtin: TPC-B (sort of)
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..d7af86e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -300,23 +303,27 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
-	char	   *commands;		/* actual pgbench script */
-}
+	char	   *script;			/* actual pgbench script */
+	Command   **commands; 		/* temporary intermediate holder */
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",
@@ -334,7 +341,8 @@ static struct
 		"UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n"
 		"UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n"
 		"INSERT INTO 

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-04 Thread Robert Haas
On Thu, Mar 3, 2016 at 2:48 AM, Shulgin, Oleksandr
 wrote:
> On Wed, Mar 2, 2016 at 7:33 PM, Alvaro Herrera 
> wrote:
>> Shulgin, Oleksandr wrote:
>>
>> > Alright.  I'm attaching the latest version of this patch split in two
>> > parts: the first one is NULLs-related bugfix and the second is the
>> > "improvement" part, which applies on top of the first one.
>>
>> So is this null-related bugfix supposed to be backpatched?  (I assume
>> it's not because it's very likely to change existing plans).
>
> For the good, because cardinality estimations will be more accurate in these
> cases, so yes I would expect it to be back-patchable.

-1.  I think the cost of changing existing query plans in back
branches is too high.  The people who get a better plan never thank
us, but the people who (by bad luck) get a worse plan always complain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2016-03-04 Thread Robert Haas
On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
 wrote:
> On 17 February 2016 at 17:50, Haribabu Kommi  wrote:
>> Here I attached a draft patch based on previous discussions. It still needs
>> better comments and optimization.
>
> Over in [1] Tom posted a large change to the grouping planner which
> causes large conflict with the parallel aggregation patch. I've been
> looking over Tom's patch and reading the related thread and I've
> observed 3 things:
>
> 1. Parallel Aggregate will be much easier to write and less code to
> base it up top of Tom's upper planner changes. The latest patch does
> add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
> be required after Tom pushes the changes to the upper planner.
> 2. If we apply parallel aggregate before Tom's upper planner changes
> go in, then Tom needs to reinvent it again when rebasing his patch.
> This seems senseless, so this is why I did this work.
> 3. Based on the thread, most people are leaning towards getting Tom's
> changes in early to allow a bit more settle time before beta, and
> perhaps also to allow other patches to go in after (e.g this)
>
> So, I've done a bit of work and I've rewritten the parallel aggregate
> code to base it on top of Tom's patch posted in [1].

Great!

> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
> stage. I just thought doing this is more complex than what's really
> needed, but if someone can think of a case where this would be a great
> win then I'll listen, but you have to remember we don't have any
> pre-sorted partial paths at this stage, so an explicit sort is
> required *always*. This might change if someone invented partial btree
> index scans... but until then...

Actually, Rahila Syed is working on that.  But it's not done yet, so
presumably will not go into 9.6.

I don't really see the logic of this, though.  Currently, Gather
destroys the input ordering, so it seems preferable for the
finalize-aggregates stage to use a hash aggregate whenever possible,
whatever the partial-aggregate stage did.  Otherwise, we need an
explicit sort.  Anyway, it seems like the two stages should be costed
and decided on their own merits - there's no reason to chain the two
decisions together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Alvaro Herrera
Fabien COELHO wrote:

> Probably it is possible, but it will sure need more that one little
> condition to be achieved... I do not think that introducing a non trivial
> distributed election algorithm involving locks and so would be a good
> decision for this very little matter.
> 
> My advice is "keep it simple".
> 
> If this is a blocker, I can sure write such an algorithm, when I have some
> spare time, but I'm not sure that the purpose is worth it.

You're probably right, but TBH I'm pretty unsure about this whole thing.
I will leave it alone for the time being.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Alvaro Herrera
Fabien COELHO wrote:

Hi,

>   *-21.patch does what you suggested above, some hidden awkwardness
>  but much less that the previous one.

Yeah, I think this is much nicer, don't you agree?

However, this is still a bit broken -- you cannot return a stack
variable from process_file, because the stack goes away once the
function returns.  You need to malloc it.

Also, you forgot to update the comments in process_file,
process_builtin, etc.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ExecGather() + nworkers

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 6:55 AM, Amit Kapila  wrote:
> On Fri, Mar 4, 2016 at 5:21 PM, Haribabu Kommi 
> wrote:
>>
>> On Fri, Mar 4, 2016 at 10:33 PM, Amit Kapila 
>> wrote:
>> > On Fri, Mar 4, 2016 at 11:57 AM, Haribabu Kommi
>> > 
>> > wrote:
>> >>
>> >> On Wed, Jan 13, 2016 at 7:19 PM, Amit Kapila 
>> >> wrote:
>> >> >>
>> >> >
>> >> > Changed the code such that nworkers_launched gets used wherever
>> >> > appropriate instead of nworkers.  This includes places other than
>> >> > pointed out above.
>> >>
>> >> The changes of the patch are simple optimizations that are trivial.
>> >> I didn't find any problem regarding the changes. I think the same
>> >> optimization is required in "ExecParallelFinish" function also.
>> >>
>> >
>> > There is already one change as below for ExecParallelFinish() in patch.
>> >
>> > @@ -492,7 +492,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
>> >
>> >   WaitForParallelWorkersToFinish(pei->pcxt);
>> >
>> >
>> >
>> >   /* Next, accumulate buffer usage. */
>> >
>> > - for (i = 0; i < pei->pcxt->nworkers; ++i)
>> >
>> > + for (i = 0; i < pei->pcxt->nworkers_launched; ++i)
>> >
>> >   InstrAccumParallelQuery(>buffer_usage[i]);
>> >
>> >
>> > Can you be slightly more specific, where exactly you are expecting more
>> > changes?
>>
>> I missed it during the comparison with existing code and patch.
>> Everything is fine with the patch. I marked the patch as ready for
>> committer.
>>
>
> Thanks!

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Incorrect error message in InitializeSessionUserId

2016-03-04 Thread Dmitriy Sarafannikov
>On Fri, Mar 4, 2016, 5:23 +03:00 от Michael Paquier < 
>michael.paqu...@gmail.com >:
>
>> The patch adds the support of taking the role name from the role tuple
>> instead of using the provided rolename variable, because it is possible
>> that rolename variable is NULL if the connection is from a background
>> worker.
>>
>> The patch is fine, I didn't find any problems, I marked it as ready for
>> committer.
>>
>> IMO this patch may need to backpatch supported branches as it is
>> a bug fix. Committer can decide.
>
>+1 for the backpatch. The current error message should the rolename be
>undefined in this context is misleading for users.
>-- 
>Michael

Thanks for feedback.

This patch successfully applies to the 9.5 branch.
In 9.4 and below versions InitializeSessionUserId function has other signature:
void InitializeSessionUserId(const char *rolename)
and it is impossible to pass role Oid to this function.

In this way, the patch is relevant only to the master and 9.5 branches

Regards,
Dmitriy Sarafannikov
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Upper planner pathification

2016-03-04 Thread Tom Lane
OK, here is a version that I think addresses all of the recent comments:

* I refactored the grouping-sets stuff as suggested by Robert and David.
The GroupingSetsPath code is now used *only* when there are grouping sets,
otherwise what you get is a plain AGG_SORTED AggPath.  This allowed
removal of a boatload of weird corner cases in the GroupingSets code path,
so it was a good change.  (Fundamentally, that's cleaning up some
questionable coding in the grouping sets patch rather than fixing anything
directly related to pathification, but I like the code better now.)

* I refactored the handling of targetlists in createplan.c.  After some
reflection I decided that the disuse_physical_tlist callers fell into
three separate categories: those that actually needed the exact requested
tlist to be returned, those that wanted non-bloated tuples because they
were going to put them into sort or hash storage, and those that needed
grouping columns to be properly labeled.  The new approach is to pass down
a "flags" word that specifies which if any of these cases apply at a
specific plan level.  use_physical_tlist now always makes the right
decision to start with, and disuse_physical_tlist is gone entirely, which
should make things a bit faster since we won't uselessly construct and
discard physical tlists.  The missing logic from make_subplanTargetList
and locate_grouping_columns is reincarnated in the physical-tlist code.

* Added explicit limit/offset fields to LimitPath, as requested by Teodor.

* Removed SortPath.sortgroupclauses.

* Fixed handling of parallel-query fields in new path node types.
(BTW, I found what seemed to be a couple of pre-existing bugs of
the same kind, eg create_mergejoin_path was different from the
other two kinds of join as to setting parallel_degree.)


What remains to be done, IMV:

* Performance testing as per yesterday's discussion.

* Debug support in outfuncs.c and print_path() for new node types.

* Clean up unfinished work on function header comments.

* Write some documentation about how FDWs might use this.

I'll work on the performance testing next.  Barring unsatisfactory
results from that, I think this could be committable in a couple
of days.

regards, tom lane



upper-planner-pathification-2.patch.gz
Description: upper-planner-pathification-2.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-04 Thread Robert Haas
On Thu, Mar 3, 2016 at 8:54 PM, Kouhei Kaigai  wrote:
> I found one other, but tiny, problem to implement SSD-to-GPU direct
> data transfer feature under the PostgreSQL storage.
>
> Extension cannot know the raw file descriptor opened by smgr.
>
> I expect an extension issues an ioctl(2) on the special device file
> on behalf of the special kernel driver, to control the P2P DMA.
> This ioctl(2) will pack file descriptor of the DMA source and some
> various information (like base position, range, destination device
> pointer, ...).
>
> However, the raw file descriptor is wrapped in the fd.c, instead of
> the File handler, thus, not visible to extension. oops...
>
> The attached patch provides a way to obtain raw file descriptor (and
> relevant flags) of a particular File virtual file descriptor on
> PostgreSQL. (No need to say, extension has to treat the raw descriptor
> carefully not to give an adverse effect to the storage manager.)
>
> How about this tiny enhancement?

Why not FileDescriptor(), FileFlags(), FileMode() as separate
functions like FilePathName()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Fabien COELHO


Hello Alvaro,


Attached is a v3 which test integers more logically. I'm a lazy
programmer who tends to minimize the number of key strokes.


Well. From what I can tell this patch is Ready for Committer.


I'm not a fan of this approach either.  Would it be too complicated if
we had a global variable that indicates which thread is the progress
reporter?  We start that with thread 0, but if the reporter thread
finishes its transactions then it elects some other thread which hasn't
yet finished.  For this to work, each thread would have to maintain in a
global variable whether it has finished or not.


Hmmm.

Probably it is possible, but it will sure need more that one little 
condition to be achieved... I do not think that introducing a non trivial 
distributed election algorithm involving locks and so would be a good 
decision for this very little matter.


My advice is "keep it simple".

If this is a blocker, I can sure write such an algorithm, when I have some 
spare time, but I'm not sure that the purpose is worth it.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect error message in InitializeSessionUserId

2016-03-04 Thread Robert Haas
On Thu, Mar 3, 2016 at 9:23 PM, Michael Paquier
 wrote:
> On Fri, Mar 4, 2016 at 10:45 AM, Haribabu Kommi
>  wrote:
>> On Wed, Mar 2, 2016 at 12:21 AM, Dmitriy Sarafannikov
>>  wrote:
>>> Hi all,
>>>
>>> I have found incorrect error message in InitializeSessionUserId function
>>> if you try to connect to database by role Oid (for example
>>> BackgroundWorkerInitializeConnectionByOid).
>>> If role have no permissions to login, you will see error message like this:
>>> FATAL:  role "(null)" is not permitted to log in
>>>
>>> I changed few lines of code and fixed this.
>>> Patch is attached.
>>> I want to add this patch to commitfest.
>>> Any objections?
>>>
>>
>> The patch adds the support of taking the role name from the role tuple
>> instead of using the provided rolename variable, because it is possible
>> that rolename variable is NULL if the connection is from a background
>> worker.
>>
>> The patch is fine, I didn't find any problems, I marked it as ready for
>> committer.
>>
>> IMO this patch may need to backpatch supported branches as it is
>> a bug fix. Committer can decide.
>
> +1 for the backpatch. The current error message should the rolename be
> undefined in this context is misleading for users.

Back-patched to 9.5.  I don't think this is relevant for earlier branches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Fabien COELHO


Hello Alvaro,


I looked at 19.d and I think the design has gotten pretty convoluted.  I
think we could simplify with the following changes:

struct script_t gets a new member, of type Command **, which is
initially null.

function process_builtin receives the complete script_t (not individual
memebers of it) constructs the Command ** array and puts it in
script_t's new member; return value is the same script_t struct it got
(except it's now augmented with the Command **array).

function process_file constructs a new script_t from the string list,
creates its Command **array just like process_builtin and returns the
constructed struct.

function addScript receives script_t instead of individual members of
it, and does the appropriate thing.


Why not. Here are two versions:

  *-20.patch is the initial rebased version

  *-21.patch does what you suggested above, some hidden awkwardness
 but much less that the previous one.

--
Fabiendiff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..dd3fb1d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 Unambiguous prefixes of builtin names are accepted.
@@ -322,12 +324,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -687,9 +691,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1194,12 +1202,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: builtin: TPC-B (sort of)
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..c131681 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -300,23 +303,26 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
 	char	   *commands;		/* actual pgbench script */
-}
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",
@@ -392,9 +398,9 @@ usage(void)
 	 "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
-		   "  -b, --builtin=NAME   add buitin script (use \"-b list\" to display\n"
-		   "   available scripts)\n"
-		   "  -f, --file=FILENAME  add transaction script from FILENAME\n"
+		   "  -b, 

Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 12:08 PM, Robert Haas  wrote:
> On Fri, Mar 4, 2016 at 12:02 PM, Alvaro Herrera
>  wrote:
>> Robert Haas wrote:
>>> On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
>>>  wrote:
>>> > Hello, thank you for the comments.
>>> >> I think we should leave string_length as it is and use a new variable
>>> >> for character-based length, as in the attached.
>>> >
>>> > Basically agreed but I like byte_length for the previous
>>> > string_length and string_length for string_length_cars. Also
>>> > text_length is renamed in the attached patch.
>>>
>>> I committed this and back-patched this but (1) I avoided changing the
>>> other functions for now and (2) I gave both the byte length and the
>>> character length new names to avoid confusion.
>>
>> These tweaks appear to have been universally disliked by buildfarm
>> members.
>
> Crap.  Wasn't careful enough, sorry.  Will fix shortly.

Fix pushed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello, thank you for the comments.
> >> I think we should leave string_length as it is and use a new variable
> >> for character-based length, as in the attached.
> >
> > Basically agreed but I like byte_length for the previous
> > string_length and string_length for string_length_cars. Also
> > text_length is renamed in the attached patch.
> 
> I committed this and back-patched this but (1) I avoided changing the
> other functions for now and (2) I gave both the byte length and the
> character length new names to avoid confusion.

These tweaks appear to have been universally disliked by buildfarm
members.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 12:02 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello, thank you for the comments.
>> >> I think we should leave string_length as it is and use a new variable
>> >> for character-based length, as in the attached.
>> >
>> > Basically agreed but I like byte_length for the previous
>> > string_length and string_length for string_length_cars. Also
>> > text_length is renamed in the attached patch.
>>
>> I committed this and back-patched this but (1) I avoided changing the
>> other functions for now and (2) I gave both the byte length and the
>> character length new names to avoid confusion.
>
> These tweaks appear to have been universally disliked by buildfarm
> members.

Crap.  Wasn't careful enough, sorry.  Will fix shortly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-04 Thread Pavel Stehule
2016-03-04 15:54 GMT+01:00 Daniel Verite :

> Corey Huinker wrote:
>
> > So, for me, RAW is the right solution, or at least *a* right solution.
>
> Questions on how to extract from a bytea column come up on a regular
> basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
> are to encode the contents in text and decode them in an additional
> step, or use COPY BINARY and filter out the headers.
>
> But none of this is as straightforward and efficient as the proposed
> COPY RAW.
> Also the conversion to text can't be used at all on very large
> contents (>512MB), as mentioned in another recent thread [5]
> (this is the same reason why pg_dump can't dump such rows),
> but COPY RAW doesn't have this limitation.
>
> Technically COPY BINARY should be sufficient, but it seems that
> people dislike having to deal with its headers.
>
Also it's not supported by any of the drivers of popular
> script languages that otherwise provide COPY in text format
> (DBD::Pg, php, psycopg2...)
> Maybe the RAW format would have a better chance to get support
> there, because of its simplicity.
>

exactly - I would to decrease dependency on PostgreSQL internals. Working
with clean content is simple and possible with any environment without
unclean operations.

Regards

Pavel


>
> [1]
>
> http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
> ng
>
> [2] http://www.postgresql.org/message-id/4c8272c4.1000...@arcor.de
>
> [3] http://stackoverflow.com/questions/6730729
>
> [4]
> http://www.postgresql.org/message-id/56c66565.50...@consistentstate.com
>
> [5] http://www.postgresql.org/message-id/14620.1456851...@sss.pgh.pa.us
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>


Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Robert Haas
On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, thank you for the comments.
>> I think we should leave string_length as it is and use a new variable
>> for character-based length, as in the attached.
>
> Basically agreed but I like byte_length for the previous
> string_length and string_length for string_length_cars. Also
> text_length is renamed in the attached patch.

I committed this and back-patched this but (1) I avoided changing the
other functions for now and (2) I gave both the byte length and the
character length new names to avoid confusion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GetExistingLocalJoinPath() vs. the docs

2016-03-04 Thread Robert Haas
On Wed, Mar 2, 2016 at 1:12 AM, Ashutosh Bapat
 wrote:
>> I think that you need to take a little broader look at this section.
>> At the top, it says "To use any of these functions, you need to
>> include the header file foreign/foreign.h in your source file", but
>> this function is defined in foreign/fdwapi.h.  It's not clear to me
>> whether we should consider moving the prototype, or just document that
>> this function is someplace else.  The other functions prototyped in
>> fdwapi.h aren't documented at all, except for
>> IsImportableForeignTable, which is mentioned in passing.
>>
>> Further down, the section says "Some object types have name-based
>> lookup functions in addition to the OID-based ones:" and you propose
>> to put the documentation for this function after that.  But this
>> comment doesn't actually describe this particular function.
>>
>>
>> Actually, this function just doesn't seem to fit into this section at
>> all.  It's really quite different from the others listed there.  How
>> about something like the attached instead?
>
> Right. Mentioning the function in the description of relevant function looks
> better and avoids some duplication.

Cool, committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Issue with NULLS LAST, with postgres_fdw sort pushdown

2016-03-04 Thread Robert Haas
On Thu, Mar 3, 2016 at 12:08 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Thu, Mar 3, 2016 at 7:27 AM, Michael Paquier 
>> wrote:
>>> Per explain.c, this looks inconsistent to me. Shouldn't NULLS LAST be
>>> applied only if DESC is used in this ORDER BY clause?
>
>> ... In this case we are constructing a query to be
>> sent to the foreign server and it's better not to leave the defaults to be
>> interpreted by the foreign server; in case it interprets them in different
>> fashion. get_rule_orderby() also explicitly adds these options.
>
> Yeah, I agree that we don't need to go out of our way to make the query
> succinct here.  Explicitness is easier and safer too, so why not?

+1.  So, committed Ashutosh's version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> This approach seems good to me, and the performance results look very
> positive.  The nice thing about this is that there is not a
> user-configurable knob; the system automatically determines when
> larger extensions are needed, which will mean that real-world users
> are much more likely to benefit from this.  I don't think it matters
> that this is a little faster or slower than an approach with a manual
> knob; what matter is that it is a huge improvement over unpatched
> master, and that it does not need a knob.  The arbitrary constant of
> 10 is a little unsettling but I think we can live with it.

+1.  "No knob" is a huge win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Greeting for coming back, and where is PostgreSQL going

2016-03-04 Thread Joshua D. Drake
On 03/04/2016 03:20 AM, MauMau wrote:

> I've been visually impaired since birth, and now I'm almost blind (can 
> only sense the light).  I'm using screen reader software to use PCs and 
> smartphones.  As I'm using pgindent, I'm sure the source code style 
> won't be bad.  But I might overlook some styling problems like 
> indentation in the documentation patches.  I'd appreciate it if you 
> could introduce a nice editor for editing SGML/XML documents.

Welcome back!

There are quite a few editors that handle SGML/XML well. In the open
source world the two most common are likely:

Emacs
 This is what Practical PostgreSQL was written in
Bluefish
 This is a GTK based editor that has some nice touches

There are others I am sure but those are the two I have experience with.

Sincerely,

JD


-- 
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 12:06 AM, Dilip Kumar  wrote:
> I have tried the approach of group extend,
>
> 1. We convert the extension lock to TryLock and if we get the lock then
> extend by one block.2.
> 2. If don't get the Lock then use the Group leader concep where only one
> process will extend for all, Slight change from ProcArrayGroupClear is that
> here other than satisfying the requested backend we Add some extra blocks in
> FSM, say GroupSize*10.
> 3. So Actually we can not get exact load but still we have some factor like
> group size tell us exactly the contention size and we extend in multiple of
> that.

This approach seems good to me, and the performance results look very
positive.  The nice thing about this is that there is not a
user-configurable knob; the system automatically determines when
larger extensions are needed, which will mean that real-world users
are much more likely to benefit from this.  I don't think it matters
that this is a little faster or slower than an approach with a manual
knob; what matter is that it is a huge improvement over unpatched
master, and that it does not need a knob.  The arbitrary constant of
10 is a little unsettling but I think we can live with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-04 Thread Robert Haas
On Wed, Mar 2, 2016 at 6:41 PM, Tom Lane  wrote:
> Jim Nasby  writes:
>> On 3/2/16 4:21 PM, Peter Geoghegan wrote:
>>> I think you should commit this. The chances of anyone other than you
>>> and Masahiko recalling that you developed this tool in 3 years is
>>> essentially nil. I think that the cost of committing a developer-level
>>> debugging tool like this is very low. Modules like pg_freespacemap
>>> currently already have no chance of being of use to ordinary users.
>>> All you need to do is restrict the functions to throw an error when
>>> called by non-superusers, out of caution.
>>>
>>> It's a problem that modules like pg_stat_statements and
>>> pg_freespacemap are currently lumped together in the documentation,
>>> but we all know that.
>
>> +1.
>
> Would it make any sense to stick it under src/test/modules/ instead of
> contrib/ ?  That would help make it clear that it's a debugging tool
> and not something we expect end users to use.

I actually think end-users might well want to use it.  Also, I created
it by hacking up pg_freespacemap, so it may make sense to have it in
the same place.

I would also be tempted to add an additional C functions that scan the
entire visibility map and return counts of the total number of bits of
each type that are set, and similarly for the page level bits.
Presumably that would be much faster than

I am also tempted to change the API to be a bit more friendly,
although I am not sure exactly how.  This was a quick and dirty hack
so that I could test, but the hardest thing about making it not a
quick and dirty hack is probably deciding on a good UI.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:09 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I would like to have a patch for this finalized today, so that we can
>> apply to master before or during the weekend; with it in the tree for
>> about a week we can be more confident and backpatch close to next
>> weekend, so that we see it in the next set of minor releases.  Does that
>> sound good?
>
> I see no reason to wait before backpatching.  If you're concerned about
> having testing, the more branches it is in, the more buildfarm cycles
> you will get on it.  And we're not going to cut any releases in between,
> so what's the benefit of not having it there?

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
>>> Well, that would make the function more complicated, but maybe it's a
>>> better answer.  On the other hand, we know that the stats updates are
>>> delivered in a deterministic order, so why not simply replace the
>>> existing test in the wait function with one that looks for the truncation
>>> updates?  If we've gotten those, we must have gotten the earlier ones.
>
>> I'm not sure if that's actually true with parallel mode.  I'm pretty
>> sure the earlier workers will have terminated before the later ones
>> start, but is that enough to guarantee that the stats collector sees
>> the messages in that order?
>
> Huh?  Parallel workers are read-only; what would they be doing sending
> any of these messages?

Mumble.  I have no idea what's happening here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-04 Thread Anastasia Lubennikova

27.02.2016 09:57, Vitaly Burovoy:

Hello, Hackers!

I worked on a patch[1] allows "EXTRACT(epoch FROM
+-Inf::timestamp[tz])" to return "+-Inf::float8".
There is an opposite function "to_timestamp(float8)" which now defined as:
SELECT ('epoch'::timestamptz + $1 * '1 second'::interval)


Hi,
thank you for the patches.
Could you explain, whether they depend on each other?


Since intervals do not support infinity values, it is impossible to do
something like:

SELECT to_timestamp('infinity'::float8);

... which is not good.

Supporting of such converting is in the TODO list[2] (by "converting
between infinity timestamp and float8").


You mention intervals here, and TODO item definitely says about 
'infinity' interval,

while patch and all the following discussion concerns to timestamps.
Is it a typo or I misunderstood something important?
I assumed that following query will work, but it isn't. Could you 
clarify that?

select to_timestamp('infinity'::interval);


Proposed patch implements it.

There is an other patch in the CF[3] 2016-03 implements checking of
timestamp[tz] for being in allowed range. Since it is wise to set
(fix) the upper boundary of timestamp[tz]s, I've included the file
"src/include/datatype/timestamp.h" from there to check that an input
value and a result are in the allowed range.

There is no changes in a documentation because allowed range is the
same as officially supported[4] (i.e. until 294277 AD).


I think that you should update documentation. At least description of 
epoch on this page:

http://www.postgresql.org/docs/devel/static/functions-datetime.html

Here is how you can convert an epoch value back to a time stamp:

SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 982384720.12 * INTERVAL '1 second';

(The |to_timestamp| function encapsulates the above conversion.)


More thoughts about the patch:

1. When I copy value from hints for min and max values (see examples 
below), it works fine for min, while max still leads to error.
It comes from the check   "if (seconds >= epoch_ubound)". I wonder, 
whether you should change hint message?


select to_timestamp(-210866803200.00);
  to_timestamp
-
 4714-11-24 02:30:17+02:30:17 BC
(1 row)


select to_timestamp(9224318016000.00);
ERROR:  UNIX epoch out of range: "9224318016000.00"
HINT:  Maximal UNIX epoch value is "9224318016000.00"

2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h:

 * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy
 * about the maximum, since it's far enough out to not be especially
 * interesting.

Maybe you can expand it?
- Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases?
- Why do we need to hold both definitions? I suppose, it's a matter of 
backward compatibility, isn't it?


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice 
abbreviation, but it seems slightly confusing to me.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
>> Well, that would make the function more complicated, but maybe it's a
>> better answer.  On the other hand, we know that the stats updates are
>> delivered in a deterministic order, so why not simply replace the
>> existing test in the wait function with one that looks for the truncation
>> updates?  If we've gotten those, we must have gotten the earlier ones.

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Huh?  Parallel workers are read-only; what would they be doing sending
any of these messages?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Craig Ringer
On 5 March 2016 at 00:10, Alvaro Herrera  wrote:

> Craig Ringer wrote:
>
> > If it's the result of perltidy changing its mind about the formatting as
> a
> > result of this change I guess we have to eyeroll and live with it.
> perltidy
> > leaves the file alone as it is in the tree currently, so that be it.
> >
> > Gripe withdrawn, ready for committer IMO
>
> Okay, thanks.  I applied it back to 9.4, which is when
> --enable-tap-tests appeared.  I didn't perltidy 9.4's config_default.pl,
> though.


Thanks very much. It didn't occur to me to backport it, but it seems
harmless.

https://commitfest.postgresql.org/9/566/ marked as committed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Failover Slots

2016-03-04 Thread Craig Ringer
On 24 February 2016 at 18:02, Craig Ringer  wrote:


> I really want to focus on the first patch, timeline following for logical
> slots. That part is much less invasive and is useful stand-alone. I'll move
> it to a separate CF entry and post it to a separate thread as I think it
> needs consideration independently of failover slots.
>

Just an update on the failover slots status: I've moved timeline following
for logical slots into its own patch set and CF entry and added a bunch of
tests.

https://commitfest.postgresql.org/9/488/

Some perl TAP test framework enhancements were needed for that; they're
mostly committed now with a few pending.

https://commitfest.postgresql.org/9/569/

Once some final changes are made to the tests for timeline following I'll
address the checkpoint issue in failover slots by doing the checkpoint of
slots at the start of a checkpoint/restartpoint, while we can still write
WAL. Per the comments in CheckPointReplicationSlots it's mostly done in a
checkpoint currently for convenience.

Then I'll write some TAP tests for failover slots and submit an updated
patch for them, by which time hopefully timeline following for logical
slots will be committed.

In other words this patch isn't dead, the foundations are just being
rebased out from under it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Alvaro Herrera
Robert Haas wrote:

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Um.  So if you have two queries that run in sequence, it's possible
for workers of the first query to be still running when workers for the
second query finish?  That would be very strange.

If that's not what you're saying, I don't understand what guarantees you
say we don't have.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Alvaro Herrera
Craig Ringer wrote:

> If it's the result of perltidy changing its mind about the formatting as a
> result of this change I guess we have to eyeroll and live with it. perltidy
> leaves the file alone as it is in the tree currently, so that be it.
> 
> Gripe withdrawn, ready for committer IMO

Okay, thanks.  I applied it back to 9.4, which is when
--enable-tap-tests appeared.  I didn't perltidy 9.4's config_default.pl,
though.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Tom Lane
Alvaro Herrera  writes:
> I would like to have a patch for this finalized today, so that we can
> apply to master before or during the weekend; with it in the tree for
> about a week we can be more confident and backpatch close to next
> weekend, so that we see it in the next set of minor releases.  Does that
> sound good?

I see no reason to wait before backpatching.  If you're concerned about
having testing, the more branches it is in, the more buildfarm cycles
you will get on it.  And we're not going to cut any releases in between,
so what's the benefit of not having it there?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> That's what it looks like to me.  I now think that the apparent
>>> connection to parallel query is a mirage.  The reason we've only
>>> seen a few cases so far is that the flapping test is new: it
>>> wad added in commit d42358efb16cc811, on 20 Feb.
>
>> It was added on Feb 20 all right, but of *last year*.  It's been there
>> working happily for a year now.
>
> Wup, you're right, failed to look closely enough at the commit log
> entry.  So that puts us back to wondering why exactly parallel query
> is triggering this.  Still, Robert's experiment with removing the
> pg_sleep seems fairly conclusive: it is possible to get the failure
> without parallel query.
>
>> Instead of adding another sleep function, another possibility is to add
>> two booleans, one for the index counter and another for the truncate
>> counters, and only terminate the sleep if both are true.  I don't see
>> any reason to make this test any slower than it already is.
>
> Well, that would make the function more complicated, but maybe it's a
> better answer.  On the other hand, we know that the stats updates are
> delivered in a deterministic order, so why not simply replace the
> existing test in the wait function with one that looks for the truncation
> updates?  If we've gotten those, we must have gotten the earlier ones.

I'm not sure if that's actually true with parallel mode.  I'm pretty
sure the earlier workers will have terminated before the later ones
start, but is that enough to guarantee that the stats collector sees
the messages in that order?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-03-04 Thread Craig Ringer
On 1 March 2016 at 21:00, Craig Ringer  wrote:

> Hi all
>
> Per discussion on the failover slots thread (
> https://commitfest.postgresql.org/9/488/) I'm splitting timeline
> following for logical slots into its own separate patch.
>
>
I've updated the logical decoding timeline following patch to fix a bug
found as a result of test development related to how Pg renames the last
WAL seg on the old timeline to suffix it with .partial on promotion. The
xlogreader must switch to reading from the newest-timeline version of a
given segment eagerly, for the first page of the segment, since that's the
only one guaranteed to actually exist.

I'd really appreciate some review of the logic there by people who know
timelines well and preferably know the xlogreader. It's really just one
function and 2/3 comments; the code is simple but the reasoning leading to
it is not.


I've also attached an updated version of the tests posted a few days ago.
The tests depend on the remaining patches from the TAP enhancements tree so
it's easiest to just get the whole tree from
https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following
(subject to regular rebases and force pushes, do not use as a base).

The tests now include a test module that exposes some slots guts to SQL to
allow the client to sync slot state from master to replica(s) without
needing failover slots and the use of extra WAL as transport. It's very
much for-testing-only.

The new test module is used by a second round of tests to demonstrate the
practicality of failover of a logical replication client to a physical
replica using a base backup taken by pg_basebackup and without the presence
of failover slots. I won't pretend it's pretty.

This proves that the approach works barring unforseen showstoppers. It also
proves it's pretty ugly - failover slots provide a much, MUCH simpler and
safer way for clients to achieve this with way less custom code needed by
each client to sync slot state.

I've got a bit of cleanup to do in the test suite and a few more tests to
write for cases where the slot on the replica is allowed to fall behind the
slot on the master but this is mostly waiting on the remaining two TAP test
patches before it can be evaluated for possible push.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 37bd2e654345af65749ccff6ca73d3afebf67072 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 11 Feb 2016 10:44:14 +0800
Subject: [PATCH 1/2] Allow logical slots to follow timeline switches

Make logical replication slots timeline-aware, so replay can
continue from a historical timeline onto the server's current
timeline.

This is required to make failover slots possible and may also
be used by extensions that CreateReplicationSlot on a standby
and replay from that slot once the replica is promoted.

This does NOT add support for replaying from a logical slot on
a standby or for syncing slots to replicas.
---
 src/backend/access/transam/xlogreader.c|  43 -
 src/backend/access/transam/xlogutils.c | 240 +++--
 src/backend/replication/logical/logicalfuncs.c |  38 +++-
 src/include/access/xlogreader.h|  35 +++-
 src/include/access/xlogutils.h |   2 +
 5 files changed, 323 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index fcb0872..5899f44 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -10,6 +10,9 @@
  *
  * NOTES
  *		See xlogreader.h for more notes on this facility.
+ *
+ * 		The xlogreader is compiled as both front-end and backend code so
+ * 		it may not use elog, server-defined static variables, etc.
  *-
  */
 
@@ -116,6 +119,9 @@ XLogReaderAllocate(XLogPageReadCB pagereadfunc, void *private_data)
 		return NULL;
 	}
 
+	/* Will be loaded on first read */
+	state->timelineHistory = NULL;
+
 	return state;
 }
 
@@ -135,6 +141,13 @@ XLogReaderFree(XLogReaderState *state)
 	pfree(state->errormsg_buf);
 	if (state->readRecordBuf)
 		pfree(state->readRecordBuf);
+#ifdef FRONTEND
+	/* FE code doesn't use this and we can't list_free_deep on FE */
+	Assert(state->timelineHistory == NULL);
+#else
+	if (state->timelineHistory)
+		list_free_deep(state->timelineHistory);
+#endif
 	pfree(state->readBuf);
 	pfree(state);
 }
@@ -208,9 +221,11 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 
 	if (RecPtr == InvalidXLogRecPtr)
 	{
+		/* No explicit start point, read the record after the one we just read */
 		RecPtr = state->EndRecPtr;
 
 		if (state->ReadRecPtr == InvalidXLogRecPtr)
+			/* allow readPageTLI to go backward */
 			randAccess = true;
 
 		/*
@@ -223,6 +238,8 @@ XLogReadRecord(XLogReaderState 

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 10:33 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Sure.  If you have an idea what the right thing to do is, please go
>> ahead.
>
> Yeah, I'll modify the patch and commit sometime later today.

OK, if you're basing that on the patch I sent upthread, please credit
Rahila Syed as the original author of that code.  (I modified it
before posting, but only trivially.)  Of course if you do something
else, then never mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> That's what it looks like to me.  I now think that the apparent
>> connection to parallel query is a mirage.  The reason we've only
>> seen a few cases so far is that the flapping test is new: it
>> wad added in commit d42358efb16cc811, on 20 Feb.

> It was added on Feb 20 all right, but of *last year*.  It's been there
> working happily for a year now.

Wup, you're right, failed to look closely enough at the commit log
entry.  So that puts us back to wondering why exactly parallel query
is triggering this.  Still, Robert's experiment with removing the
pg_sleep seems fairly conclusive: it is possible to get the failure
without parallel query.

> Instead of adding another sleep function, another possibility is to add
> two booleans, one for the index counter and another for the truncate
> counters, and only terminate the sleep if both are true.  I don't see
> any reason to make this test any slower than it already is.

Well, that would make the function more complicated, but maybe it's a
better answer.  On the other hand, we know that the stats updates are
delivered in a deterministic order, so why not simply replace the
existing test in the wait function with one that looks for the truncation
updates?  If we've gotten those, we must have gotten the earlier ones.

In any case, the real answer to making the test less slow is to get rid of
that vestigial pg_sleep.  I'm wondering why we failed to remove that when
we put in the wait_for_stats function...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] syslog configurable line splitting behavior

2016-03-04 Thread Alexander Korotkov
On Sat, Feb 27, 2016 at 6:49 AM, Peter Eisentraut  wrote:

> Writing log messages to syslog caters to ancient syslog implementations
> in two ways:
>
> - sequence numbers
> - line splitting
>
> While these are arguably reasonable defaults, I would like a way to turn
> them off, because they get in the way of doing more interesting things
> with syslog (e.g., logging somewhere that is not just a text file).
>
> So I propose the two attached patches that introduce new configuration
> Boolean parameters syslog_sequence_numbers and syslog_split_lines that
> can toggle these behaviors.
>

Would it have any usage if we make PG_SYSLOG_LIMIT configurable (-1 for
disable) instead of introducing boolean?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Alvaro Herrera
I would like to have a patch for this finalized today, so that we can
apply to master before or during the weekend; with it in the tree for
about a week we can be more confident and backpatch close to next
weekend, so that we see it in the next set of minor releases.  Does that
sound good?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Alvaro Herrera
Tom Lane wrote:
> Robert Haas  writes:
> > Sure.  If you have an idea what the right thing to do is, please go
> > ahead.
> 
> Yeah, I'll modify the patch and commit sometime later today.
> 
> > I actually don't have a clear idea what's going on here.  I
> > guess it's that the wait_for_stats() guarantees that the stats message
> > from the index insertion has been received but the status messages
> > from the "trunc" tables might not have gotten there yet.
> 
> That's what it looks like to me.  I now think that the apparent
> connection to parallel query is a mirage.  The reason we've only
> seen a few cases so far is that the flapping test is new: it
> wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
> as-is, I think we'd eventually see the same failure without forcing
> parallel mode.  In fact, that's pretty much what you describe below,
> isn't it?  The pg_sleep is sort of half-bakedly substituting for
> a proper wait.

It was added on Feb 20 all right, but of *last year*.  It's been there
working happily for a year now.

The reason I added the trunc test in the middle of the index update
tests is that I dislike tests that sleep for long without real purpose;
it seems pretty reasonable to me to have both sleeps actually be the
same wait.

Instead of adding another sleep function, another possibility is to add
two booleans, one for the index counter and another for the truncate
counters, and only terminate the sleep if both are true.  I don't see
any reason to make this test any slower than it already is.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL log only necessary part of 2PC GID

2016-03-04 Thread Jesper Pedersen

On 02/29/2016 08:45 AM, Pavan Deolasee wrote:

Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined
as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
used by the applications though may be much smaller than that. So IMO
instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and
some simple crash recovery tests seem to be work ok. In terms of
performance, a quick test shows marginal improvement in tps using the
script that Stas Kelvich used for his work on speeding up twophase
transactions. The only change I made is to keep the :scale unchanged
because increasing the :scale in every iteration will result in only a
handful updates (not sure why Stas had that in his original script)

\set naccounts 10 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
:from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
:to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25%
with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because
of HOT pruning and page defragmentation. If we just look at the WAL
overhead caused by 2PC, the decline is somewhere close to 50%. I took
numbers using simple 1PC for reference and to understand the overhead of
2PC.

HEAD (1PC): 382 bytes / transaction



I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has 
changed definition, right ?


Thanks for working on this !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_resetxlog reference page reorganization

2016-03-04 Thread Alexander Korotkov
Hi, Peter!

I've assigned myself as reviewer of this patch.

On Tue, Mar 1, 2016 at 2:53 AM, Peter Eisentraut  wrote:

> The pg_resetxlog reference page has grown over the years into an
> unnavigable jungle, so here is a patch that reorganizes it to be more in
> the style of the other ref pages, with a normal options list.
>

Patch applies cleanly on head, documentation compiles with no problem.
pg_resetxlog page definitely looks much better than it was before.
I don't see any problems or issues with this patch.
So, I mark it "Ready for committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane
Robert Haas  writes:
> Sure.  If you have an idea what the right thing to do is, please go
> ahead.

Yeah, I'll modify the patch and commit sometime later today.

> I actually don't have a clear idea what's going on here.  I
> guess it's that the wait_for_stats() guarantees that the stats message
> from the index insertion has been received but the status messages
> from the "trunc" tables might not have gotten there yet.

That's what it looks like to me.  I now think that the apparent
connection to parallel query is a mirage.  The reason we've only
seen a few cases so far is that the flapping test is new: it
wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
as-is, I think we'd eventually see the same failure without forcing
parallel mode.  In fact, that's pretty much what you describe below,
isn't it?  The pg_sleep is sort of half-bakedly substituting for
a proper wait.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Alvaro Herrera
Aleksander Alekseev wrote:
> > Attached is a v3 which test integers more logically. I'm a lazy
> > programmer who tends to minimize the number of key strokes.
> 
> Well. From what I can tell this patch is Ready for Committer. 

I'm not a fan of this approach either.  Would it be too complicated if
we had a global variable that indicates which thread is the progress
reporter?  We start that with thread 0, but if the reporter thread
finishes its transactions then it elects some other thread which hasn't
yet finished.  For this to work, each thread would have to maintain in a
global variable whether it has finished or not.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >