Re: Built-in connection pooling

2018-04-24 Thread Christophe Pettus

> On Apr 24, 2018, at 06:52, Merlin Moncure  wrote:
> Why does it have to be completely transparent? 

Well, we have non-transparent connection pooling now, in the form of pgbouncer, 
and the huge fleet of existing application-stack poolers.  The main reason to 
move it into core is to avoid the limitations that a non-core pooler has.

--
-- Christophe Pettus
   x...@thebuild.com




Re: Minor comment update in execPartition.c

2018-04-24 Thread Etsuro Fujita

(2018/04/25 11:05), Alvaro Herrera wrote:

Amit Langote wrote:


I think we should remove the words "if not already done" from that
comment because 1) that function is called if the partition wasn't
already initialized and 2) that function assumes that.  Attached is a
small patch for removing the words.


Thanks, sounds fine.  I think those words remain from earlier versions of
the patch committed as edd44738bc8 [1], where there used to be a fast-path
return if the ResultRelInfo was already non-NULL for the passed in index.


Makes sense -- pushed, thanks both.


Thanks for committing, Alvaro!  Thanks for reviewing, Amit!

Best regards,
Etsuro Fujita



Re: Oddity in tuple routing for foreign partitions

2018-04-24 Thread Amit Langote
Oops, really meant to send the "+1 to the root -> rte refactoring" comment
and the rest in the same email.

On 2018/04/25 4:49, Robert Haas wrote:
> I have done some refactoring to avoid that.  See attached.
> 
> I didn't really get beyond the refactoring stage with this today.
> This version still seems to work, but I don't really understand the
> logic in postgresBeginForeignInsert which decides whether to use the
> RTE from the range table or create our own.  We seem to need to do one
> sometimes and the other sometimes, but I don't know why that is,
> really.  Nor do I understand why we need the other changes in the
> patch.  There's probably a good reason, but I haven't figured it out
> yet.

So, the main part of the patch that fixes the bug is the following per the
latest v4 patch.

+if (resultRelInfo->ri_PartitionRoot && plan)
+{
+bool dostuff = true;
+
+if (resultRelation != plan->nominalRelation)
+dostuff = false;
+else
+resultRelation = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+
+if (dostuff)
+{
+rte = list_nth(estate->es_range_table, resultRelation - 1);
+rte = copyObject(rte);
+rte->relid = RelationGetRelid(rel);
+rte->relkind = RELKIND_FOREIGN_TABLE;
+}
+}
+
+if (rte == NULL)
+{
+rte = makeNode(RangeTblEntry);
+rte->rtekind = RTE_RELATION;
+rte->relid = RelationGetRelid(rel);
+rte->relkind = RELKIND_FOREIGN_TABLE;
+}

After the refactoring, it appears to me that we only need this much:

+rte = makeNode(RangeTblEntry);
+rte->rtekind = RTE_RELATION;
+rte->relid = RelationGetRelid(rel);
+rte->relkind = RELKIND_FOREIGN_TABLE;

that is, *after* teaching ExecInitPartitionInfo to correctly set the RT
index of the ResultRelInfo it creates.  After the refactoring, the only
thing this patch needs to do is to choose the RT index of the result
relation correctly.  We currently use this:

+Index   resultRelation = resultRelInfo->ri_RangeTableIndex;

And the rest of the code the patch adds is only to adjust it based on
where the partition ResultRelInfo seems to have originated from.  If the
ri_RangeTableIndex was set correctly to begin with, we wouldn't need to
fiddle with that in the FDW code.  Regular ResultRelInfo's already have it
set correctly, it's only the ones that ExecInitPartitionInfo creates that
seem be creating the problem.

I tried to do that.  So, attached are two patches.

1. Fix for ExecInitPartitionInfo to use the right RT index to pass to
   InitResultRelInfo

2. v5 of the patch to fix the bug of foreign partitions

Thoughts?

Thanks,
Amit
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6e2fa1420c..d272719ff4 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -125,7 +125,7 @@ static char *deparse_type_name(Oid type_oid, int32 typemod);
  * Functions to construct string representation of a node tree.
  */
 static void deparseTargetList(StringInfo buf,
- PlannerInfo *root,
+ RangeTblEntry *rte,
  Index rtindex,
  Relation rel,
  bool is_returning,
@@ -137,13 +137,13 @@ static void deparseExplicitTargetList(List *tlist,
  List **retrieved_attrs,
  deparse_expr_cxt *context);
 static void deparseSubqueryTargetList(deparse_expr_cxt *context);
-static void deparseReturningList(StringInfo buf, PlannerInfo *root,
+static void deparseReturningList(StringInfo buf, RangeTblEntry *rte,
 Index rtindex, Relation rel,
 bool trig_after_row,
 List *returningList,
 List **retrieved_attrs);
 static void deparseColumnRef(StringInfo buf, int varno, int varattno,
-PlannerInfo *root, bool qualify_col);
+RangeTblEntry *rte, bool qualify_col);
 static void deparseRelation(StringInfo buf, Relation rel);
 static void deparseExpr(Expr *expr, deparse_expr_cxt *context);
 static void deparseVar(Var *node, deparse_expr_cxt *context);
@@ -1050,7 +1050,7 @@ deparseSelectSql(List *tlist, bool is_subquery, List 
**retrieved_attrs,
 */
Relationrel = heap_open(rte->relid, NoLock);
 
-   deparseTargetList(buf, root, foreignrel->relid, rel, false,
+   deparseTargetList(buf, rte, foreignrel->relid, rel, false,
  fpinfo->attrs_used, false, 
retrieved_attrs);
heap_close(rel, NoLock);
}
@@ -1099,7 +1099,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt 

Typo in JIT documentation

2018-04-24 Thread Michael Paquier
Hi all,

I just found $subject:
 the server was compiled without --with-llvm),
-JIT will not performed, even if considered to be
+JIT will not be performed, even if considered to be
 beneficial based on the above criteria.  Setting 

Thanks,
--
Michael
diff --git a/doc/src/sgml/jit.sgml b/doc/src/sgml/jit.sgml
index 2a647e8c6c..fc841337c3 100644
--- a/doc/src/sgml/jit.sgml
+++ b/doc/src/sgml/jit.sgml
@@ -139,7 +139,7 @@
 If  is set to off, or no
 JIT implementation is available (for example because
 the server was compiled without --with-llvm),
-JIT will not performed, even if considered to be
+JIT will not be performed, even if considered to be
 beneficial based on the above criteria.  Setting 
 to off takes effect both at plan and at execution time.



signature.asc
Description: PGP signature


Re: Minor comment update in execPartition.c

2018-04-24 Thread Alvaro Herrera
Amit Langote wrote:

> > I think we should remove the words "if not already done" from that
> > comment because 1) that function is called if the partition wasn't
> > already initialized and 2) that function assumes that.  Attached is a
> > small patch for removing the words.
> 
> Thanks, sounds fine.  I think those words remain from earlier versions of
> the patch committed as edd44738bc8 [1], where there used to be a fast-path
> return if the ResultRelInfo was already non-NULL for the passed in index.

Makes sense -- pushed, thanks both.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Oddity in tuple routing for foreign partitions

2018-04-24 Thread Kyotaro HORIGUCHI
I forgot to mention that.

At Wed, 25 Apr 2018 10:17:02 +0900, Amit Langote 
 wrote in 

> On 2018/04/25 4:49, Robert Haas wrote:
> > On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas  wrote:
> >> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
> >>  wrote:
> >>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
> >>> into it?
> >>
> >> Thanks for the ping.  I just had a look over the proposed patch and I
> >> guess I don't like it very much.  Temporarily modifying the range
> >> table in place and then changing it back before we return seems ugly
> >> and error-prone.  I hope we can come up with a solution that doesn't
> >> involve needing to do that.
> > 
> > I have done some refactoring to avoid that.  See attached.
> 
> +1 for getting rid of the PlannerInfo argument of the many functions in
> that code.  I have long wondered if we couldn't rid of it and especially
> thought of it when reviewing this patch.

+1 from me. Thanks for making things simpler and easy to
understand. I feel the same as Amit:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Oddity in tuple routing for foreign partitions

2018-04-24 Thread Amit Langote
On 2018/04/25 4:49, Robert Haas wrote:
> On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas  wrote:
>> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
>>  wrote:
>>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
>>> into it?
>>
>> Thanks for the ping.  I just had a look over the proposed patch and I
>> guess I don't like it very much.  Temporarily modifying the range
>> table in place and then changing it back before we return seems ugly
>> and error-prone.  I hope we can come up with a solution that doesn't
>> involve needing to do that.
> 
> I have done some refactoring to avoid that.  See attached.

+1 for getting rid of the PlannerInfo argument of the many functions in
that code.  I have long wondered if we couldn't rid of it and especially
thought of it when reviewing this patch.

Thanks,
Amit




Re: Oddity in tuple routing for foreign partitions

2018-04-24 Thread Kyotaro HORIGUCHI
Hello.

At Tue, 24 Apr 2018 15:49:20 -0400, Robert Haas  wrote 
in 
> On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas  wrote:
> > On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
> >  wrote:
> >> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
> >> into it?
> >
> > Thanks for the ping.  I just had a look over the proposed patch and I
> > guess I don't like it very much.  Temporarily modifying the range
> > table in place and then changing it back before we return seems ugly
> > and error-prone.  I hope we can come up with a solution that doesn't
> > involve needing to do that.
> 
> I have done some refactoring to avoid that.  See attached.
> 
> I didn't really get beyond the refactoring stage with this today.
> This version still seems to work, but I don't really understand the
> logic in postgresBeginForeignInsert which decides whether to use the
> RTE from the range table or create our own.  We seem to need to do one
> sometimes and the other sometimes, but I don't know why that is,
> really.  Nor do I understand why we need the other changes in the
> patch.  There's probably a good reason, but I haven't figured it out
> yet.

FWIW, the refactoring drops the condition that the ForeignInsert
is a part of UPDATE.  The code exists just for avoiding temprary
RTE generation in 2 cases out of the 3 possible cases mentioned
in [1]. If it looks too complex for the gain, we can always
create an RTE for deparsing as Fujita-san's first patch in this
thread did. Anyway the condition for "dostuff" + "is update"
might be a bit too arbitrary.

[1] 
https://www.postgresql.org/message-id/f970d875-9711-b8cb-f270-965fa3e40...@lab.ntt.co.jp

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Minor comment update in execPartition.c

2018-04-24 Thread Amit Langote
Fujita-san,

On 2018/04/24 20:14, Etsuro Fujita wrote:
> Hi,
> 
> Here is a comment for ExecInitPartitionInfo:
> 
>  296  * ExecInitPartitionInfo
>  297  *  Initialize ResultRelInfo and other information for a
> partition if not
>  298  *  already done
> 
> I think we should remove the words "if not already done" from that
> comment because 1) that function is called if the partition wasn't
> already initialized and 2) that function assumes that.  Attached is a
> small patch for removing the words.

Thanks, sounds fine.  I think those words remain from earlier versions of
the patch committed as edd44738bc8 [1], where there used to be a fast-path
return if the ResultRelInfo was already non-NULL for the passed in index.
 I remember you suggested making it so that we call ExecInitPartitionInfo
only if needed [2].  After making the changes as suggested, the "if not
already done" became redundant.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=edd44738bc8

[2] https://www.postgresql.org/message-id/5A7443DF.1010408%40lab.ntt.co.jp




Re: obsoleting plpython2u and defaulting plpythonu to plpython3u

2018-04-24 Thread Peter Eisentraut
On 4/24/18 07:13, Pavel Raiskup wrote:
> .. the status quo seems to be bit optimistic with the "distant future",
> and we should start thinking about dropping plpython2 support, same as
> upstream (a bit optimistically too, IMO) does [1].

I don't think we are going to drop Python 2 support anytime soon.

> What's the expected future migration path from plpython2 to plpython3 in
> such cases?  I'm thinking about rewrite of the docs and creating some
> scripting which could simplify the migration steps.  Would such patches be
> welcome at this point?

I'm not sure what you have in mind.  In many cases, you can just change
the LANGUAGE clause.  I suppose you could run 2to3 over the function
body?  Write a PL/Python function to run 2to3?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-04-24 Thread Alvaro Herrera
Amit Langote wrote:
> On 2018/04/24 6:10, Alvaro Herrera wrote:

> > Not really sure how best to handle that one.  For starters, I think it need
> > to stop mentioning the GUC name in the title;
> 
> Hmm, "Constraint Exclusion" that's used in the title is a concept, not a
> GUC, although pretty close.

Yeah, I meant that if we want that section to cover the general concept
of partition pruning, with either technique, better not use the words
"constraint exclusion" in the title.

> > maybe rename it to
> > "Partition Pruning" or some such, and then in the text explain that
> > sometimes the enable_partition_pruning param is used in one case and
> > constraint_exclusion in the other, and approximately what effects they
> > have.  I don't think it's worth going into too much detail on exactly
> > how they differ, but then I'm not 100% sure of that either.
> 
> Just a thought -- How about making 5.10.4 cover partitioning based
> optimizations in general?  I see that a number of partitioning-based
> optimizations have been developed in this release cycle, but I only see
> various enable_partition* GUCs listed in config.sgml and not much else.

I think we should not rely on the config.sgml blurbs exclusively; some
narrative is always welcome -- except that for planner enable_* settings
I'm not sure we really need all that much text after all.  Constraint
exclusion was pretty easy to get wrong, hence the need for a separate
section, and I suppose the new partition pruning may be prey to the same
problems, so it seems worth to document them specially.  But not sure
about the others, if they are mostly debugging tools.

> Although the config.sgml coverage of the new capabilities seems pretty
> good, some may find their being mentioned in 5.10 Table Partitioning
> helpful.  Or if we don't want to hijack 5.10.4, maybe create a 5.10.5.

Can you (or someone) describe what would that section contain?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Extending a bit string

2018-04-24 Thread Evan Carroll
> In hindsight, it would likely be more consistent with this if we'd
> considered bitstrings to be LSB first when coercing them to/from integer,
> but whoever stuck that behavior in didn't think about it.  Too late to
> change that now I'm afraid, though perhaps we could provide non-cast
> conversion functions that act that way.


I've been thinking about that, and that actually makes sense and I'd prefer
to revert to the pre-8.0 behavior. I just wanted to speak up to retract
that response too. In reality, I am used to the interger display as it
currently is. The current behavior of the coercion to/from int enforces the
bias that I have. It led me to believe that PostgreSQL would act like that
consistently because that's what I am used to.

SELECT 5::int::bit(8);
   bit
--
 0101

As compared to 1010. But fundamentally SQL and the current helper
functions don't operate like that, so it's bizarre. Moreover, the
difference between the two makes it very error prone. For example, this
doesn't make sense,

SELECT get_bit(1::bit(1), 0), get_bit(1::bit(2), 1);

But, this does

SELECT get_bit(B'1'::bit(1), 0), get_bit(B'1'::bit(2), 1);

I'm sure it would have been substantially less confusing if integers
displayed their LSB on the left after casting. I think I would have figured
out what was going on *much* faster. You were right on everything in your
initial response (as I've come to expect).

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011


Re: [sqlsmith] Unpinning error in parallel worker

2018-04-24 Thread Jonathan Rudenberg


On Tue, Apr 24, 2018, at 16:06, Thomas Munro wrote:
> On Wed, Apr 25, 2018 at 2:21 AM, Jonathan Rudenberg
>  wrote:
> > This issue happened again in production, here are the stack traces from 
> > three we grabbed before nuking the >400 hanging backends.
> >
> > [...]
> > #4  0x55fccb93b21c in LWLockAcquire+188() at 
> > /usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
> > #5  0x55fccb925fa7 in dsm_create+151() at 
> > /usr/lib/postgresql/10/bin/postgres at dsm.c:493
> > #6  0x55fccb6f2a6f in InitializeParallelDSM+511() at 
> > /usr/lib/postgresql/10/bin/postgres at parallel.c:266
> > [...]
> 
> Thank you.  These stacks are all blocked trying to acquire
> DynamicSharedMemoryControlLock.  My theory is that they can't because
> one backend -- the one that emitted the error "FATAL:  cannot unpin a
> segment that is not pinned" -- is deadlocked against itself.  After
> emitting that error you can see from Andreas's "seabisquit" stack that
> that shmem_exit() runs dsm_backend_shutdown() which runs dsm_detach()
> which tries to acquire DynamicSharedMemoryControlLock again, even
> though we already hold it at that point.
> 
> I'll write a patch to fix that unpleasant symptom.  While holding
> DynamicSharedMemoryControlLock we shouldn't raise any errors without
> releasing it first, because the error handling path will try to
> acquire it again.  That's a horrible failure mode as you have
> discovered.
> 
> But that isn't the root problem: we shouldn't be raising that error,
> and I'd love to see the stack of the one process that did that and
> then self-deadlocked.  I will have another go at trying to reproduce
> it here today.

Thanks for the update!

We have turned off parallel queries (using max_parallel_workers_per_gather = 0) 
for now, as the production impact of this bug is unfortunately quite 
problematic.

What will this failure look like with the patch you've proposed?

Thanks again,

Jonathan



Re: Oddity in tuple routing for foreign partitions

2018-04-24 Thread Robert Haas
On Tue, Apr 24, 2018 at 12:21 PM, Robert Haas  wrote:
> On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
>  wrote:
>> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
>> into it?
>
> Thanks for the ping.  I just had a look over the proposed patch and I
> guess I don't like it very much.  Temporarily modifying the range
> table in place and then changing it back before we return seems ugly
> and error-prone.  I hope we can come up with a solution that doesn't
> involve needing to do that.

I have done some refactoring to avoid that.  See attached.

I didn't really get beyond the refactoring stage with this today.
This version still seems to work, but I don't really understand the
logic in postgresBeginForeignInsert which decides whether to use the
RTE from the range table or create our own.  We seem to need to do one
sometimes and the other sometimes, but I don't know why that is,
really.  Nor do I understand why we need the other changes in the
patch.  There's probably a good reason, but I haven't figured it out
yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


fix-tuple-routing-for-foreign-partitions-4.patch
Description: Binary data


Re: Transform for pl/perl

2018-04-24 Thread Andrew Dunstan


On 04/24/2018 12:17 PM, Peter Eisentraut wrote:
> On 4/10/18 10:31, Dagfinn Ilmari Mannsåker wrote:
>> Also, it doesn't parse back in as jsonb either:
>>
>> =# select jsonbnan()::text::json;
>> ERROR:  invalid input syntax for type json
>> DETAIL:  Token "NaN" is invalid.
>> CONTEXT:  JSON data, line 1: NaN
>>
>> And it's inconsistent with to_jsonb():
>>
>> =# select to_jsonb('nan'::numeric);
>> ┌──┐
>> │ to_jsonb │
>> ├──┤
>> │ "NaN"│
>> └──┘
>>
>> It would be highly weird if PL transforms (jsonb_plpython does the same
>> thing) let you create spec-violating jsonb values that don't round-trip
>> via jsonb_out/in.
> Yeah this is not good.  Is there a way to do this in a centralized way?
> Is there a function to check an internal jsonb value for consistency.
> Should at least the jsonb output function check and not print invalid
> values?
>


The output function fairly reasonably assumes that the jsonb is in a
form that would be parsed in by the input function. In particular, it
assumes that anything like a NaN will be stored as text and not as a
jsonb numeric. I don't think the transform should be doing anything
different from the input function.

There is the routine IsValidJsonNumber that helps - see among others
hstore_io.c for an example use.

cheers

andrew


-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: lazy detoasting

2018-04-24 Thread Peter Eisentraut
On 4/11/18 11:33, Tom Lane wrote:
> (Wanders away wondering what Peter has done about toasted parameter
> values for procedures in general ...)

I'm not sure.  How can a procedure have a toasted parameter value?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-24 Thread Alvaro Herrera
Andres Freund wrote:

> What I wonder, after skimming this change, is where the relevant
> expression context is reset?  That's not really related to this change
> but the wider thread, I just noticed it while looking at this.

Do you mean ResetExprContext?  We use the plan's context, so it should
occur together with the plan reset itself, i.e. nodeAppend.c should do
it somewhere.

... Hmm, it appears we don't do it anywhere there actually.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Clock with Adaptive Replacement

2018-04-24 Thread Andres Freund
Hi,

On 2018-04-24 17:16:47 +0500, Andrey Borodin wrote:
> But, I think that cost of development of real page eviction strategy
> itself is neglectable small compared to infrastructure changes needed
> by any non-CS5 strategy.

What problems are you seeing? This isn't a lot of code?

Greetings,

Andres Freund



Re: [HACKERS] Clock with Adaptive Replacement

2018-04-24 Thread Юрий Соколов
вт, 24 апр. 2018 г., 15:16 Andrey Borodin :

>
>
> > 24 апр. 2018 г., в 11:31, Юрий Соколов 
> написал(а):
> >
> > Before implementing algorithms within PostgreSQL it will be great to
> test them outside with real traces.
> >
> > I think, there should be mechamism to collect traces from real-world
> postgresql instalations: every read and write access. It should be
> extremely eficient to be enabled in real world. Something like circular
> buffer in shared memory, and separate worker to dump it to disk.
> > Instead of full block address, 64bit hash could be used. Even 63bit +
> 1bit to designate read/write access.
> Yes, this is good idea to track pattern of blocks usage.
> But, I think that cost of development of real page eviction strategy
> itself is neglectable small compared to infrastructure changes needed by
> any non-CS5 strategy.
>
> Best regards, Andrey Borodin.


Exactly. And it will be pity if implemented strategy will not be "the best".

Different strategies may need different infrastructure. And implementing
five infrastructures are certainly more expensive than choosing two best
strategies using traces and implementing them.


Fsync request queue

2018-04-24 Thread Andres Freund
Hi,

While thinking about the at the fsync mess, I started looking at the
fsync request queue. I was primarily wondering whether we can keep FDs
open long enough (by forwarding them to the checkpointer) to guarantee
that we see the error. But that's mostly irrelevant for what I'm
wondering about here.

The fsync request queue often is fairly large. 20 bytes for each
shared_buffers isn't a neglebible overhead. One reason it needs to be
fairly large is that we do not deduplicate while inserting, we just add
an entry on every single write.

ISTM that using a hashtable sounds saner, because we'd deduplicate on
insert. While that'd require locking, we can relatively easily reduce
the overhead of that by keeping track of something like mdsync_cycle_ctr
in MdfdVec, and only insert again if the cycle was incremented since.

Right now if the queue is full and can't be compacted we end up
fsync()ing on every single write, rather than once per checkpoint
afaict. That's a fairly horrible.

For the case that there's no space in the map, I'd suggest to just do
10% or so of the fsync in the poor sod of a process that finds no
space. That's surely better than constantly fsyncing on every single
write. We can also make bgwriter check the size of the hashtable on a
regular basis and do some of them if it gets too full.

The hashtable also I think has some advantages for the future. I've
introduced something very similar in my radix tree based buffer mapping.

Greetings,

Andres Freund



Re: Toast issues with OldestXmin going backwards

2018-04-24 Thread Peter Eisentraut
On 4/23/18 00:33, Amit Kapila wrote:
> Yeah, right, I have missed the point that they can be vacuumed
> separately, however, I think that decision is somewhat questionable.

Manually vacuuming the TOAST table was a way to repair the recently
fixed TOAST bug, so it's kind of useful.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-04-24 Thread Andres Freund
On 2018-04-19 12:04:35 +1200, David Rowley wrote:
> On 19 April 2018 at 03:13, Robert Haas  wrote:
> > 10% is more than a "slight" improvement, I'd say!  It's certainly got
> > to be worth avoiding the repeated calls to ExecInitExpr, whatever we
> > do about the memory contexts.

Yea, that seems important. Good that that got in.

What I wonder, after skimming this change, is where the relevant
expression context is reset?  That's not really related to this change
but the wider thread, I just noticed it while looking at this.

Greetings,

Andres Freund



Re: Corrupted btree index on HEAD because of covering indexes

2018-04-24 Thread Peter Geoghegan
On Tue, Apr 24, 2018 at 9:06 AM, Teodor Sigaev  wrote:
> Perfect!

Thanks!

> I would like to commit it but have some suggestions:

I attach a revised version, which has changes based on your feedback.

> to improve test stability it would be better to disable autovacuum:
> ALTER TABLE bttest_multi SET (autovacuum_enabled = false)

Done.

> 2) Pls, add to test DELETE as it done in
> 6db4b49986be3fe59a1f6ba6fabf9852864efc3e

Done. I will leave it to you to decide whether or not the original
create_index.sql test can now be removed.

> 3) It's not directly connected to this patch, but allocation of
> BtreeCheckState is not needed, it could be allocated on stack.
>
> 4) Nevertheless, I suggest to use palloc0 (or memset(0)) for
> BtreeCheckState. Now several fields of that structure could be not inited.

I like the idea of using palloc0(). Done that way.

-- 
Peter Geoghegan
From 36bd9387542fa594c03877b4f0c90211dc6a4c21 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 19 Apr 2018 17:45:26 -0700
Subject: [PATCH v2] Add missing and dangling downlink checks to amcheck.

When bt_index_parent_check() is called with the heapallindexed option,
allocate a second Bloom filter to fingerprint block numbers that appear
in the downlinks of internal pages.  Use Bloom filter probes when
walking the B-Tree to detect missing downlinks.  This can detect subtle
problems with page deletion/VACUUM, such as corruption caused by the bug
just fixed in commit 6db4b499.

The downlink Bloom filter is bound in size by work_mem.  Its optimal
size is typically far smaller than that of the regular heapallindexed
Bloom filter, especially when the index has high fan-out.

Author: Peter Geoghegan
Discussion: https://postgr.es/m/cah2-wznuzy4fwtjm1tbb3jpvz8ccfz7k_qvp5bhupyhivmw...@mail.gmail.com
---
 contrib/amcheck/expected/check_btree.out |  23 +-
 contrib/amcheck/sql/check_btree.sql  |  20 +-
 contrib/amcheck/verify_nbtree.c  | 404 +--
 doc/src/sgml/amcheck.sgml|   7 +-
 4 files changed, 431 insertions(+), 23 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index ed80ac4..e864579 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -1,7 +1,12 @@
--- minimal test, basically just verifying that amcheck
 CREATE TABLE bttest_a(id int8);
 CREATE TABLE bttest_b(id int8);
 CREATE TABLE bttest_multi(id int8, data int8);
+CREATE TABLE delete_test_table (a bigint, b bigint, c bigint, d bigint);
+-- Stabalize tests
+ALTER TABLE bttest_a SET (autovacuum_enabled = false);
+ALTER TABLE bttest_b SET (autovacuum_enabled = false);
+ALTER TABLE bttest_multi SET (autovacuum_enabled = false);
+ALTER TABLE delete_test_table SET (autovacuum_enabled = false);
 INSERT INTO bttest_a SELECT * FROM generate_series(1, 10);
 INSERT INTO bttest_b SELECT * FROM generate_series(10, 1, -1);
 INSERT INTO bttest_multi SELECT i, i%2  FROM generate_series(1, 10) as i;
@@ -120,9 +125,25 @@ SELECT bt_index_parent_check('bttest_multi_idx', true);
  
 (1 row)
 
+--
+-- Test for multilevel page deletion/downlink present checks
+--
+INSERT INTO delete_test_table SELECT i, 1, 2, 3 FROM generate_series(1,8) i;
+ALTER TABLE delete_test_table ADD PRIMARY KEY (a,b,c,d);
+DELETE FROM delete_test_table WHERE a > 4;
+VACUUM delete_test_table;
+DELETE FROM delete_test_table WHERE a > 10;
+VACUUM delete_test_table;
+SELECT bt_index_parent_check('delete_test_table_pkey', true);
+ bt_index_parent_check 
+---
+ 
+(1 row)
+
 -- cleanup
 DROP TABLE bttest_a;
 DROP TABLE bttest_b;
 DROP TABLE bttest_multi;
+DROP TABLE delete_test_table;
 DROP OWNED BY bttest_role; -- permissions
 DROP ROLE bttest_role;
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index 4ca9d2d..7b1ab4f 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -1,7 +1,13 @@
--- minimal test, basically just verifying that amcheck
 CREATE TABLE bttest_a(id int8);
 CREATE TABLE bttest_b(id int8);
 CREATE TABLE bttest_multi(id int8, data int8);
+CREATE TABLE delete_test_table (a bigint, b bigint, c bigint, d bigint);
+
+-- Stabalize tests
+ALTER TABLE bttest_a SET (autovacuum_enabled = false);
+ALTER TABLE bttest_b SET (autovacuum_enabled = false);
+ALTER TABLE bttest_multi SET (autovacuum_enabled = false);
+ALTER TABLE delete_test_table SET (autovacuum_enabled = false);
 
 INSERT INTO bttest_a SELECT * FROM generate_series(1, 10);
 INSERT INTO bttest_b SELECT * FROM generate_series(10, 1, -1);
@@ -71,9 +77,21 @@ TRUNCATE bttest_multi;
 INSERT INTO bttest_multi SELECT i, i%2  FROM generate_series(1, 10) as i;
 SELECT bt_index_parent_check('bttest_multi_idx', true);
 
+--
+-- Test for multilevel page deletion/downlink present checks
+--
+INSERT INTO delete_test_table SELECT i, 1, 2, 3 FROM 

Re: Built-in connection pooling

2018-04-24 Thread Konstantin Knizhnik



On 23.04.2018 23:14, Robert Haas wrote:

On Wed, Apr 18, 2018 at 9:41 AM, Heikki Linnakangas  wrote:

Well, may be I missed something, but i do not know how to efficiently
support
1. Temporary tables
2. Prepared statements
3. Sessoin GUCs
with any external connection pooler (with pooling level other than
session).

Me neither. What makes it easier to do these things in an internal
connection pooler? What could the backend do differently, to make these
easier to implement in an external pooler?

I think you are Konstantin are possibly failing to see the big picture
here.  Temporary tables, prepared statements, and GUC settings are
examples of session state that users expect will be preserved for the
lifetime of a connection and not beyond; all session state, of
whatever kind, has the same set of problems.  A transparent connection
pooling experience means guaranteeing that no such state vanishes
before the user ends the current session, and also that no such state
established by some other session becomes visible in the current
session.  And we really need to account for *all* such state, not just
really big things like temporary tables and prepared statements and
GUCs but also much subtler things such as the state of the PRNG
established by srandom().


It is not quit true thst I have not realized this issues.
In addition to connection pooling, I have also implemented pthread 
version of Postgres and their static variables are replaced with 
thread-local variables which let each thread use its own set of variables.


Unfortunately in connection pooling this approach can not be used.
But I think that performing scheduling at transaction level will 
eliminate the problem with static variables in most cases.
My expectation is that there are very few of them which has 
session-level lifetime.
Unfortunately it is not so easy to locate all such places. Once such 
variables are located, them can be saved in session context and restored 
on reschedule.


More challenging thing is to handle system static variables which which 
can not be easily saved/restored. You example with srandom is exactly 
such case.
Right now I do not know any efficient way to suspend/resume 
pseudo-random sequence.
But frankly speaking, that such behaviour of random is completely not 
acceptable and built-in session pool unusable.





This is really very similar to the problem that parallel query has
when spinning up new worker backends.  As far as possible, we want the
worker backends to have the same state as the original backend.
However, there's no systematic way of being sure that every relevant
backend-private global, including perhaps globals added by loadable
modules, is in exactly the same state.  For parallel query, we solved
that problem by copying a bunch of things that we knew were
commonly-used (cf. parallel.c) and by requiring functions to be
labeled as parallel-restricted if they rely on anything other state.
The problem for connection pooling is much harder.  If you only ever
ran parallel-safe functions throughout the lifetime of a session, then
you would know that the session has no "hidden state" other than what
parallel.c already knows about (except for any functions that are
mislabeled, but we can say that's the user's fault for mislabeling
them).  But as soon as you run even one parallel-restricted or
parallel-unsafe function, there might be a global variable someplace
that holds arbitrary state which the core system won't know anything
about.  If you want to have some other process take over that session,
you need to copy that state to the new process; if you want to reuse
the current process for a new session, you need to clear that state.
Since you don't know it exists or where to find it, and since the code
to copy and/or clear it might not even exist, you can't.

In other words, transparent connection pooling is going to require
some new mechanism, which third-party code will have to know about,
for tracking every last bit of session state that might need to be
preserved or cleared.  That's going to be a big project.  Maybe some
of that can piggyback on existing infrastructure like
InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
state to deal with.  And no out-of-core pooler has a chance of
handling all that stuff correctly; an in-core pooler will be able to
do so only with a lot of work.


I think that situation with parallel executors are slightly different: 
in this case several backends perform execution of the same query.

So them really need to somehow share/synchronize state of static variables.
But in case of connection pooling only one transaction is executed by 
backend at each moment of time. And there should be no problems with 
static variables unless them cross transaction boundary. But I do not 
think that there are many such variables.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Oddity in tuple routing for foreign partitions

2018-04-24 Thread Robert Haas
On Mon, Apr 23, 2018 at 11:25 AM, Alvaro Herrera
 wrote:
> Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
> into it?

Thanks for the ping.  I just had a look over the proposed patch and I
guess I don't like it very much.  Temporarily modifying the range
table in place and then changing it back before we return seems ugly
and error-prone.  I hope we can come up with a solution that doesn't
involve needing to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Transform for pl/perl

2018-04-24 Thread Peter Eisentraut
On 4/10/18 10:31, Dagfinn Ilmari Mannsåker wrote:
> Also, it doesn't parse back in as jsonb either:
> 
> =# select jsonbnan()::text::json;
> ERROR:  invalid input syntax for type json
> DETAIL:  Token "NaN" is invalid.
> CONTEXT:  JSON data, line 1: NaN
> 
> And it's inconsistent with to_jsonb():
> 
> =# select to_jsonb('nan'::numeric);
> ┌──┐
> │ to_jsonb │
> ├──┤
> │ "NaN"│
> └──┘
> 
> It would be highly weird if PL transforms (jsonb_plpython does the same
> thing) let you create spec-violating jsonb values that don't round-trip
> via jsonb_out/in.

Yeah this is not good.  Is there a way to do this in a centralized way?
Is there a function to check an internal jsonb value for consistency.
Should at least the jsonb output function check and not print invalid
values?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Corrupted btree index on HEAD because of covering indexes

2018-04-24 Thread Teodor Sigaev

Perfect! I would like to commit it but have some suggestions:

1)
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2  FROM generate_series(1, 10) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true);

to improve test stability it would be better to disable autovacuum:
ALTER TABLE bttest_multi SET (autovacuum_enabled = false)


2) Pls, add to test DELETE as it done in 
6db4b49986be3fe59a1f6ba6fabf9852864efc3e

3) It's not directly connected to this patch, but allocation of BtreeCheckState 
is not needed, it could be allocated on stack.


4) Nevertheless, I suggest to use palloc0 (or memset(0)) for BtreeCheckState. 
Now several fields of that structure could be not inited.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



Re: Extending a bit string

2018-04-24 Thread Evan Carroll
>
> That's SQL:99 6.22  general rule 11) c).
> (SV and TD are the source value and the target datatype for a cast.)
>
> In hindsight, it would likely be more consistent with this if we'd
> considered bitstrings to be LSB first when coercing them to/from integer,
> but whoever stuck that behavior in didn't think about it.  Too late to
> change that now I'm afraid, though perhaps we could provide non-cast
> conversion functions that act that way.
>

Apologies, I was under the impression that casts were not in the spec. I
withdraw my request. In the 2016-draft it reads,

> If the length in octets M of SV is smaller than LTD, then TV is SV
extended on the right by
LTD–M X'00's.

That's how I read it too, and whether I feel like it's insane doesn't
matter much. But yet, the idea

5:bit(8)::bit(32)::int

Not being 5 is terrifying, so you won't find any objections to the current
behavior from me.

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-24 Thread Peter Eisentraut
On 4/18/18 02:37, Pavan Deolasee wrote:
> While I agree that we should clean it up, I wonder if changing error
> text would be a good idea. These errors are being reported by a very
> long time and if we change the text, we might forget the knowledge about
> the past reports.

It's presumably fixed now, so if we see the new error message, we'll
know it's a different issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-24 Thread Bruce Momjian
On Wed, Apr 18, 2018 at 12:07:52PM +0530, Pavan Deolasee wrote:
> 
> 
> On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
> 
> On 4/10/18 06:29, Pavan Deolasee wrote:
> > One of our 2ndQuadrant support customers recently reported a sudden rush
> > of TOAST errors post a crash recovery, nearly causing an outage. Most
> > errors read like this:
> >
> > ERROR: unexpected chunk number 0 (expected 1) for toast value 
> 
> While researching this, I found that the terminology in this code is
> quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
> numbers, etc. seemingly interchangeably.  The above error is actually
> about the chunk_seq, not about the chunk_id, as one might think.
> 
> The attached patch is my attempt to clean this up a bit.  Thoughts?
> 
> 
> While I agree that we should clean it up, I wonder if changing error text 
> would
> be a good idea. These errors are being reported by a very long time and if we
> change the text, we might forget the knowledge about the past reports.
> 
> Also, "toast value" is same as "chunk_id". Should we clean up something there
> too? "chunk_seq number" -- should that be just "chunk_seq"?

We can put a comment next to the error message C string if we want to
keep historical knowledge.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Extending a bit string

2018-04-24 Thread Tom Lane
Evan Carroll  writes:
> Currently the behavior of bit-string extensions is pretty insane.

You've provided no support for this assertion, much less any defense
of why your proposed semantics change is any less insane.  Also, if
I understood you correctly, you want to change the semantics of
casting a bitstring to a bitstring of a different length, which is
an operation that's defined by the SQL standard.  You will get zero
traction on that unless you convince people that we've misread the
standard.  Which is possible, but the text seems clear to me that
casting bit(2) to bit(4) requires addition of zeroes on the right:

11) If TD is fixed-length bit string, then let LTD be the length in
bits of TD. Let BLSV be the result of BIT_LENGTH(SV).
...
c) If BLSV is smaller than LTD, then TV is SV expressed as a
  bit string extended on the right with LTD-BLSV bits whose
  values are all 0 (zero) and a completion condition is raised:
  warning - implicit zero-bit padding.

That's SQL:99 6.22  general rule 11) c).
(SV and TD are the source value and the target datatype for a cast.)

In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it.  Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

regards, tom lane



Re: Built-in connection pooling

2018-04-24 Thread Adam Brusselback
On Tue, Apr 24, 2018 at 9:52 AM, Merlin Moncure  wrote:
>
> Why does it have to be completely transparent?  As long as the feature
> is optional (say, a .conf setting) the tradeoffs can be managed.  It's
> a reasonable to expect to exchange some functionality for pooling;
> pgbouncer provides a 'release' query (say, DISCARD ALL)  to be called
> upon release back to the pool.  Having session state objects (not all
> of which we are talking about; advisory locks and notifications
> deserve consideration) 'just work' would be wonderful but ought not to
> hold up other usages of the feature.
>
> merlin

Just my $0.02, I wouldn't take advantage of this feature as a user
without it being transparent.
I use too many of the features which would be affected by not
maintaining the state.  That's one of the reasons I only use an
external JDBC pooler for my primary application, and plain ole
connections for all of my secondary services which need to just work
with temp tables, session variables, etc.  I'd love it if I could use
one of those poolers (or a built in one) which just magically
increased performance for starting up connections, lowered the
overhead of idle sessions, and didn't mess with session state.

Short of that, i'll take the hit in performance and using more memory
than I should with direct connections for now.

Not sure how other users feel, but that's where  i'm sitting for my use case.



Re: [sqlsmith] Unpinning error in parallel worker

2018-04-24 Thread Jonathan Rudenberg


On Fri, Apr 20, 2018, at 00:42, Thomas Munro wrote:
> On Wed, Apr 18, 2018 at 11:43 AM, Jonathan Rudenberg
>  wrote:
> > On Tue, Apr 17, 2018, at 19:31, Thomas Munro wrote:
> >> On Wed, Apr 18, 2018 at 11:01 AM, Jonathan Rudenberg
> >>  wrote:
> >> > Yep, I think I know approximately what it looked like, I've attached a 
> >> > lightly redacted plan. All of the hung queries were running some variant 
> >> > of this plan as far as I can tell.
> >>
> >> Hmm, that isn't a parallel query.  I was expecting to see "Gather" and
> >> "Parallel" in there.
> >
> > Oops, I'm really sorry about that. I only have the first part of the hung 
> > queries, and there are a few variants. Here's one that's parallel.
> 
> I spent some time trying to reproduce this failure without any luck,
> using query plans similar to your Gather plan fragment, and using some
> test harness code for the allocator stuff in isolation.
> 
> I had an idea that (1) freeing a large object that releases and unpins
> a segment in one backend and then (2) freeing it again in another
> backend (illegally) might produce this effect with sufficiently bad
> luck.  I'm still trying to reproduce that without any success, but I
> get other kinds of failures which I think you'd be seeing too if that
> hunch were right.  Still looking...


This issue happened again in production, here are the stack traces from three 
we grabbed before nuking the >400 hanging backends.

process: /proc/24194/mem
thread: 0x7f46263648c0, lwp: 24194, type: 1
#0  0x7f4625fdc827 in do_futex_wait+55() at 
/lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205
#1  0x7f4625fdc8d4 in __new_sem_wait_slow+84() at 
/lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181
#2  0x7f4625fdc97a in __new_sem_wait+58() at 
/lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29
#3  0x55fccb8c8832 in PGSemaphoreLock+34() at 
/usr/lib/postgresql/10/bin/postgres at pg_sema.c:310
#4  0x55fccb93b21c in LWLockAcquire+188() at 
/usr/lib/postgresql/10/bin/postgres at lwlock.c:1233
#5  0x55fccb925fa7 in dsm_create+151() at 
/usr/lib/postgresql/10/bin/postgres at dsm.c:493
#6  0x55fccb6f2a6f in InitializeParallelDSM+511() at 
/usr/lib/postgresql/10/bin/postgres at parallel.c:266
#7  0x55fccb815f27 in ExecInitParallelPlan+807() at 
/usr/lib/postgresql/10/bin/postgres at execParallel.c:470
#8  0x55fccb825a50 in ExecGather+1168() at 
/usr/lib/postgresql/10/bin/postgres at nodeGather.c:158
#9  0x55fccb81dbd3 in ExecAppend+83() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#10 0x55fccb81df2c in fetch_input_tuple+172() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#11 0x55fccb820293 in ExecAgg+1235() at /usr/lib/postgresql/10/bin/postgres 
at nodeAgg.c:2539
#12 0x55fccb835235 in CteScanNext+229() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#13 0x55fccb817f0d in ExecScan+557() at /usr/lib/postgresql/10/bin/postgres 
at execScan.c:97
#14 0x55fccb832155 in ExecNestLoop+165() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#15 0x55fccb832155 in ExecNestLoop+165() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#16 0x55fccb832155 in ExecNestLoop+165() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#17 0x55fccb81df2c in fetch_input_tuple+172() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#18 0x55fccb8201eb in ExecAgg+1067() at /usr/lib/postgresql/10/bin/postgres 
at nodeAgg.c:2347
#19 0x55fccb811dfb in standard_ExecutorRun+379() at 
/usr/lib/postgresql/10/bin/postgres at executor.h:250
#20 0x55fccb94d19b in PortalRunSelect+507() at 
/usr/lib/postgresql/10/bin/postgres at pquery.c:932
#21 0x55fccb94e798 in PortalRun+824() at 
/usr/lib/postgresql/10/bin/postgres at pquery.c:773
#22 0x55fccb94b329 in PostgresMain+3721() at 
/usr/lib/postgresql/10/bin/postgres at postgres.c:1984
#23 0x55fccb6840d9 in ServerLoop+3412() at 
/usr/lib/postgresql/10/bin/postgres at postmaster.c:4405
#24 0x55fccb8da78b in PostmasterMain+4235() at 
/usr/lib/postgresql/10/bin/postgres at postmaster.c:1363
#25 0x55fccb6854d5 in main+2165() at /usr/lib/postgresql/10/bin/postgres at 
main.c:228
#26 0x7f4623c07830 in __libc_start_main+240() at 
/lib/x86_64-linux-gnu/libc.so.6 at libc-start.c:291
#27 0x55fccb685559 in _start!+41() at /usr/lib/postgresql/10/bin/postgres

process: /proc/24231/mem
thread: 0x7f46263648c0, lwp: 24231, type: 1
#0  0x7f4625fdc827 in do_futex_wait+55() at 
/lib/x86_64-linux-gnu/libpthread.so.0 at futex-internal.h:205
#1  0x7f4625fdc8d4 in __new_sem_wait_slow+84() at 
/lib/x86_64-linux-gnu/libpthread.so.0 at sem_waitcommon.c:181
#2  0x7f4625fdc97a in __new_sem_wait+58() at 
/lib/x86_64-linux-gnu/libpthread.so.0 at sem_wait.c:29
#3  0x55fccb8c8832 in PGSemaphoreLock+34() at 
/usr/lib/postgresql/10/bin/postgres at pg_sema.c:310
#4  0x55fccb93b21c in 

Re: Built-in connection pooling

2018-04-24 Thread Merlin Moncure
On Mon, Apr 23, 2018 at 3:14 PM, Robert Haas  wrote:
> In other words, transparent connection pooling is going to require
> some new mechanism, which third-party code will have to know about,
> for tracking every last bit of session state that might need to be
> preserved or cleared.  That's going to be a big project.  Maybe some
> of that can piggyback on existing infrastructure like
> InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
> state to deal with.  And no out-of-core pooler has a chance of
> handling all that stuff correctly; an in-core pooler will be able to
> do so only with a lot of work.

Why does it have to be completely transparent?  As long as the feature
is optional (say, a .conf setting) the tradeoffs can be managed.  It's
a reasonable to expect to exchange some functionality for pooling;
pgbouncer provides a 'release' query (say, DISCARD ALL)  to be called
upon release back to the pool.  Having session state objects (not all
of which we are talking about; advisory locks and notifications
deserve consideration) 'just work' would be wonderful but ought not to
hold up other usages of the feature.

merlin



Re: [HACKERS] Clock with Adaptive Replacement

2018-04-24 Thread Andrey Borodin


> 24 апр. 2018 г., в 11:31, Юрий Соколов  написал(а):
> 
> Before implementing algorithms within PostgreSQL it will be great to test 
> them outside with real traces.
> 
> I think, there should be mechamism to collect traces from real-world 
> postgresql instalations: every read and write access. It should be extremely 
> eficient to be enabled in real world. Something like circular buffer in 
> shared memory, and separate worker to dump it to disk.
> Instead of full block address, 64bit hash could be used. Even 63bit + 1bit to 
> designate read/write access.
Yes, this is good idea to track pattern of blocks usage.
But, I think that cost of development of real page eviction strategy itself is 
neglectable small compared to infrastructure changes needed by any non-CS5 
strategy.

Best regards, Andrey Borodin.


Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-04-24 Thread Etsuro Fujita

(2018/04/17 19:00), Etsuro Fujita wrote:

(2018/04/17 18:43), Ashutosh Bapat wrote:

Here's updated patch-set.


Will review.


I started reviewing this.

o 0001-Handle-ConvertRowtypeExprs-in-pull_vars_clause.patch:

+   else if (IsA(node, ConvertRowtypeExpr))
+   {
+#ifdef USE_ASSERT_CHECKING
+   ConvertRowtypeExpr *cre = castNode(ConvertRowtypeExpr, node);
+   Var*var;
+
+   /*
+* ConvertRowtypeExprs only result when parent's whole-row 
reference is
+* translated for a child using adjust_appendrel_attrs(). That 
function

+* does not handle any upper level Var references.
+*/
+   while (IsA(cre->arg, ConvertRowtypeExpr))
+   cre = castNode(ConvertRowtypeExpr, cre->arg);
+   var = castNode(Var, cre->arg);
+   Assert (var->varlevelsup == 0);
+#endif /* USE_ASSERT_CHECKING */

Isn't it better to throw ERROR as in other cases in pull_vars_clause()? 
 Another thing I noticed about this patch is this:


postgres=# create table prt1 (a int, b int, c varchar) partition by 
range (a);
postgres=# create table prt1_p1 partition of prt1 FOR VALUES FROM (0) TO 
(250);
postgres=# create table prt1_p2 partition of prt1 FOR VALUES FROM (250) 
TO (500)

;
postgres=# insert into prt1 select i, i % 25, to_char(i, 'FM') from 
generate

_series(0, 499) i where i % 2 = 0;
postgres=# analyze prt1;
postgres=# create table prt2 (a int, b int, c varchar) partition by 
range (b);
postgres=# create table prt2_p1 partition of prt2 FOR VALUES FROM (0) TO 
(250);
postgres=# create table prt2_p2 partition of prt2 FOR VALUES FROM (250) 
TO (500)

;
postgres=# insert into prt2 select i % 25, i, to_char(i, 'FM') from 
generate

_series(0, 499) i where i % 3 = 0;
postgres=# analyze prt2;

postgres=# update prt1 t1 set c = 'foo' from prt2 t2 where t1::text = 
t2::text a

nd t1.a = t2.b;
ERROR:  ConvertRowtypeExpr found where not expected

To fix this, I think we need to pass PVC_RECURSE_CONVERTROWTYPES to 
pull_vars_clause() in distribute_qual_to_rels() and 
generate_base_implied_equalities_no_const() as well.


That's all I have for now.  Will continue the review.

Best regards,
Etsuro Fujita



Minor comment update in execPartition.c

2018-04-24 Thread Etsuro Fujita
Hi,

Here is a comment for ExecInitPartitionInfo:

 296  * ExecInitPartitionInfo
 297  *  Initialize ResultRelInfo and other information for a
partition if not
 298  *  already done

I think we should remove the words "if not already done" from that
comment because 1) that function is called if the partition wasn't
already initialized and 2) that function assumes that.  Attached is a
small patch for removing the words.

Best regards,
Etsuro Fujita
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index a2f6b29..4c9f71f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -294,8 +294,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd,
 
 /*
  * ExecInitPartitionInfo
- *		Initialize ResultRelInfo and other information for a partition if not
- *		already done
+ *		Initialize ResultRelInfo and other information for a partition
  *
  * Returns the ResultRelInfo
  */


obsoleting plpython2u and defaulting plpythonu to plpython3u

2018-04-24 Thread Pavel Raiskup
Forwarding to hackers.  Hope that's OK after few days.

Pavel

--  Forwarded Message  --

Subject: obsoleting plpython2u and defaulting plpythonu to plpython3u
Date: Thursday, April 19, 2018, 11:56:40 AM CEST
From: Pavel Raiskup 
To: pgsql-general 

Per current plpython docs:

The language named plpythonu implements PL/Python based on the default
Python language variant, which is currently Python 2. (This default is
independent of what any local Python installations might consider to
be their “default”, for example, what /usr/bin/python might be.) The
default will probably be changed to Python 3 in a distant future
release of PostgreSQL, depending on the progress of the migration to
Python 3 in the Python community.

.. the status quo seems to be bit optimistic with the "distant future",
and we should start thinking about dropping plpython2 support, same as
upstream (a bit optimistically too, IMO) does [1].

The docs don't suggest the explicit use of plpython2, but still the docs
are not discouraging from it -- so it is likely some clusters run against
that.

What's the expected future migration path from plpython2 to plpython3 in
such cases?  I'm thinking about rewrite of the docs and creating some
scripting which could simplify the migration steps.  Would such patches be
welcome at this point?

[1] https://pythonclock.org/

Pavel
-






Re: [HACKERS] Custom compression methods

2018-04-24 Thread Alexander Korotkov
On Mon, Apr 23, 2018 at 7:34 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> IMHO end-user do not have skills and time to create their own compression
> algorithms. And without knowledge of specific of particular data set,
> it is very hard to implement something more efficient than universal
> compression library.
> But if you think that it is not a right place and time to discuss it, I do
> not insist.
>

For sure, end-users wouldn't implement own compression algorithms.
In the same way as end-users wouldn't implement custom datatypes,
operator classes, procedural language handlers etc.  But those are
useful extension mechanisms which pass test of time.  And extension
developers use them.


> But in any case, I think that it will be useful to provide some more
> examples of custom compression API usage.
> From my point of view the most useful will be integration with zstd.
> But if it is possible to find some example of data-specific compression
> algorithms which show better results than universal compression,
> it will be even more impressive.
>

Yes, this patch definitely lacks of good usage example.  That may
lead to some misunderstanding of its purpose.  Good use-cases
should be shown before we can consider committing this.  I think
Ildus should try to implement at least custom dictionary compression
method where dictionary is specified by user in parameters.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2018-04-24 Thread Kyotaro HORIGUCHI
Thank you very much for looking this!

At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas  wrote 
in <89e33d4f-5c75-0738-3dcb-44c4df59e...@iki.fi>
> On 18/01/18 20:54, Kyotaro HORIGUCHI wrote:
> > At Thu, 18 Jan 2018 11:52:52 -0800, Andres Freund 
> > wrote in <20180118195252.hyxgkb3krcqjz...@alap3.anarazel.de>
> >> On 2018-01-18 20:58:27 +0900, Michael Paquier wrote:
> >>> b) The second patch that I would like to mention is doing things on
> >>> the
> >>> standby side by being able to change a WAL source when fetching a
> >>> single
> >>> record so as it is possible to fetch the beginning of a cross-segment
> >>> record from say an archive or the local xlog, and then request the
> >>> rest
> >>> on the primary. This patch can be found in
> >>> https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp
> >>> and the diff in WaitForWALToBecomeAvailable() scares me a lot because,
> >>> it seems to me that this actually breaks timeline switch
> >>> consistency. The concept of switching a WAL source in the middle of a
> >>> WAL segment is risky anyway, because we perform sanity checks while
> >>> switching sources. The bug we are dealing about here is very rare, and
> >>> seeing a problem like that for a timeline switch is even more rare,
> >>> but
> >>> the risk is not zero and we may finish with a corrupted instance, so
> >>> we
> >>> should not in my opinion take this approach. Also this actually
> >>> achieves
> >>> point 2) above, not completely though, but not 1).
> >>
> >> I don't agree with the sentiment about the approach, but I agree there
> >> might be weaknesses in the implementation (which I have not
> >> reviewed). I
> >> think it's perfectly sensible to require fetching one segment from
> >> pg_xlog and the next one via another method. Currently the inability
> >> to
> >> do so often leads to us constantly refetching the same segment.
> > Though I'm still not fully confident, if reading a set of
> > continuation records from two (or more) sources can lead to
> > inconsistency, two successve (complete) records are facing the
> > same risk.
> 
> This seems like the best approach to me as well.
> 
> Looking at the patch linked above
> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):
> 
> > --- a/src/backend/access/transam/xlog.c
> > +++ b/src/backend/access/transam/xlog.c
> > @@ -11693,6 +11693,10 @@ retry:
> > Assert(reqLen <= readLen);
> > *readTLI = curFileTLI;
> > +
> > + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> > readBuf))
> > +   goto next_record_is_invalid;
> > +
> > return readLen;
> >   next_record_is_invalid:
> 
> What is the point of adding this XLogReaderValidatePageHeader() call?
> The caller of this callback function, ReadPageInternal(), checks the
> page header anyway. Earlier in this thread, you said:

Without the lines, server actually fails to start replication.

(I try to remember the details...)

The difference is whether the function can cause retry for the
same portion of a set of continued records (without changing the
plugin API). XLogPageRead can do that. On the other hand all
ReadPageInternal can do is just letting the caller ReadRecords
retry from the beginning of the set of continued records since
the caller side handles only complete records.

In the failure case, in XLogPageRead, when read() fails, it can
try the next source midst of a continued records. On the other
hand if the segment was read but it was recycled one, it passes
"success" to ReadPageInternal and leads to retring from the
beginning of the recrod. Infinite loop.

Calling XLogReaderValidatePageHeader in ReadPageInternal is
redundant, but removing it may be interface change of xlogreader
plugin and I am not sure that is allowed.

> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader(at)xlogreader(dot)c
> > public (and renamed to XLogReaderValidatePageHeader).
> 
> but I don't understand that.

It is exposed so that XLogPageRead can validate the page header
using it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Make description of heap records more talkative for flags

2018-04-24 Thread Jehan-Guillaume de Rorthais
Hi all,

Bellow a 1¢ on feedback from a side project about this.

On Mon, 23 Apr 2018 12:37:20 -0300
Alvaro Herrera  wrote:

> Michael Paquier wrote:
> > On Thu, Apr 12, 2018 at 08:49:03PM -0700, Andres Freund wrote:  
> > > OTOH, that also kinda bloats the output noticeably... I'm somewhat
> > > inclined to just put the hex value or such there?  
> > 
> > That would do as well for me.  
> 
> Me too.  Should we consider this for pg11?  My vote is yes, with no
> backpatch (might break scripts reading pg_{wal,xlog}dump output.
> Though, really!?).

Yes, we are using pg_{wal,xlog}dump in PAF project to check a controlled
switchover is safe. 

We use it to check that the designated standby to promote received the shutdown
checkpoint from the old master when PAF shut it down before starting it again
as a regular standby. It allows us to safely hook the old master as a standby
on the new master without xact loss and/or using pg_rewind.

I suppose we can deal with the format output change though.




GCC 8 warnings

2018-04-24 Thread Devrim Gündüz

Hi,

While building stable releases and v11 on Fedora 28, I am seeing some warnings.
What is the project's policy about fixing those warnings in older branches? 

To contribute to world peace, I did not attach the text to the email. Here is
what I see in today's git snapshot:

https://gunduz.org/temp/pg-gcc8-v11.txt

...and this is from 9.6:

https://gunduz.org/temp/pg-gcc8-9.6.txt

Regards,
-- 
Devrim Gündüz
EnterpriseDB: https://www.enterprisedb.com
PostgreSQL Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

signature.asc
Description: This is a digitally signed message part


Re: Built-in connection pooling

2018-04-24 Thread Konstantin Knizhnik



On 23.04.2018 21:56, Robert Haas wrote:

On Fri, Jan 19, 2018 at 11:59 AM, Tomas Vondra
 wrote:

Hmmm, that's unfortunate. I guess you'll have process the startup packet
in the main process, before it gets forked. At least partially.

I'm not keen on a design that would involve doing more stuff in the
postmaster, because that would increase the chances of the postmaster
accidentally dying, which is really bad.  I've been thinking about the
idea of having a separate "listener" process that receives
connections, and that the postmaster can restart if it fails.  Or
there could even be multiple listeners if needed.  When the listener
gets a connection, it hands it off to another process that then "owns"
that connection.

One problem with this is that the process that's going to take over
the connection needs to get started by the postmaster, not the
listener.  The listener could signal the postmaster to start it, just
like we do for background workers, but that might add a bit of
latency.   So what I'm thinking is that the postmaster could maintain
a small (and configurably-sized) pool of preforked workers.  That
might be worth doing independently, as a way to reduce connection
startup latency, although somebody would have to test it to see
whether it really works... a lot of the startup work can't be done
until we know which database the user wants.



I agree that starting separate "listener" process(es) is the most 
flexible and scalable solution.
I have not implemented this apporach due to the problems with forking 
new backend you have mentioned.

But certainly it can be addressed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: using expression syntax for partition bounds

2018-04-24 Thread Kyotaro HORIGUCHI
Thanks. I have almost missed this.

At Mon, 23 Apr 2018 11:44:14 +0900, Amit Langote 
 wrote in 

> On 2018/04/23 11:37, Amit Langote wrote:
> > I tried to update the patch to do things that way.  I'm going to create a
> > new entry in the next CF titled "generalized expression syntax for
> > partition bounds" and add the patch there.
> 
> Tweaked the commit message to credit all the authors.

+  any variable-free expression (subqueries, window functions, aggregate
+  functions, and set-returning functions are not allowed).  The data type
+  of the partition bound expression must match the data type of the
+  corresponding partition key column.

Parititioning value is slimiar to column default expression in
the sense that it appeas within a DDL. The description for
DEFAULT is:

|  The value is any variable-free expression (subqueries and
|  cross-references to other columns in the current table are not
|  allowed)

It actually refuses aggregates, window functions and SRFs but it
is not mentioned.  Whichever we choose, they ought to have the
similar description.

>   /*
>* Strip any top-level COLLATE clause, as they're inconsequential.
>* XXX - Should we add a WARNING here?
>*/
>   while (IsA(value, CollateExpr))
>   value = (Node *) ((CollateExpr *) value)->arg;

Ok, I'll follow Tom's suggestion but collate is necessarilly
appears in this shape. For example ('a' collate "de_DE") || 'b')
has the collate node under the top ExprOp and we get a complaint
like following with it.

> ERROR:  unrecognized node type: 123

123 is T_CollateExpr. The expression "value" (mmm..) reuires
eval_const_expressions to eliminate CollateExprs.  It requires
assign_expr_collations to retrieve value's collation but we don't
do that since we ignore collation this in this case.


The following results in a strange-looking error.

> =# create table pt1 partition of p for values in (a);
> ERROR:  column "a" does not exist
> LINE 1: create table pt1 partition of p for values in (a);

The p/pt1 actually have a column a.

The following results in similar error and it is wrong, too.

> =# create table pr1 partition of pr for values from (a + 1) to (a + 2);
> ERROR: column "a" does not exist
> LINE 1: create table pr1 partition of pr for values from (a + 1) to ...

Being similar but a bit different, the following command leads to
a SEGV since it creates PARTITION_RANGE_DATUM_VALUE with value =
NULL. Even if it leaves the original node, validateInfiniteBounds
rejects it and aborts.

> =# create table pr1 partition of pr for values from (hoge) to (hige);
(crash)


I fixed this using pre-columnref hook in the attached v3. This
behavles for columnrefs differently than DEFAULT.

The v3-2 does the same thing with DEFAULT. The behevior differs
whether the column exists or not.

> ERROR:  cannot use column referencees in bound value
> ERROR:  column "b" does not exist

I personally think that such similarity between DEFALUT and
partition bound (v3-2) is not required. But inserting the hook
(v3) doesn't look good for me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 763b4f573c..b6f3ddc22f 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -275,10 +275,10 @@ WITH ( MODULUS numeric_literal, REM
  
   Each of the values specified in
   the partition_bound_spec is
-  a literal, NULL, MINVALUE, or
-  MAXVALUE.  Each literal value must be either a
-  numeric constant that is coercible to the corresponding partition key
-  column's type, or a string literal that is valid input for that type.
+  any variable-free expression (subqueries and cross-references to other
+  columns in the current table are not allowed).  The data type of the
+  partition bound expression must match the data type of the corresponding
+  partition key column.
  
 
  
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 505ae0af85..77ebb40ef2 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -150,8 +150,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
 static Node *substitute_actual_parameters_mutator(Node *node,
 	 substitute_actual_parameters_context *context);
 static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
-			  Oid result_collation);
 static Query *substitute_actual_srf_parameters(Query *expr,
  int nargs, List *args);
 static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4840,7 +4838,7 @@ sql_inline_error_callback(void *arg)
  * We use the executor's routine ExecEvalExpr() to avoid duplication of
  * code and ensure we get the same 

Re: community bonding

2018-04-24 Thread Aleksander Alekseev
Hello Charles,

> Thanks for selecting my project as one of GSoC student projects! Pretty
> exciting and honor to join the development for PostgreSQL (the best
> database in the world :)).

Welcome!

> 1. What ide or command line tools do you guys used most for PostgreSQL
> development?

Personally I prefer Vim. Sublime Text and Visual Studio Code are fine too.

> 2. What version control do you use for postgres? Is github acceptable?

Github is OK.

> 3. How do you look at code in PostgreSQL code base? do you have cross
> function reference where I can search function and definition online?

Vim + Ctags. Editors like Sublime Text and Visual Studio Code have
features like "goto definition" out-of-the-box.

If you have any other questions please don't hesitate to ask!

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


[RFC] Add an until-0 loop in psql

2018-04-24 Thread Pierre Ducroquet
Hi

When running database migrations with .sql files on a live database, it's not 
uncommon to have to run a migration in a loop to prevent a big lock on a 
table.
For instance if one want to delete some old datas from a big table one would 
write :

DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true 
LIMIT 1000);
VACUUM big_table;

Right now, doing this is quite inefficient. We either have to write a script 
in another language, or run psql in a shell loop and wait for the migration to 
stop altering rows.

The attached **proof of concept** patch (I insist, it's a 15 minutes hack 
sprint with no previous knowledge of psql code) implements an 'until-0' loop 
in psql.
The previous migration could be simply written as :

\until-0
BEGIN;
DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true 
LIMIT 1000);
VACUUM big_table;
COMMIT;
\end-until

And psql will execute it until there is no row affected in the inner queries.

I am willing to write a proper patch for this (I hope the tell/seek is an 
acceptable implementation…), but I prefer having some feedback first.

Thanks

 Pierrediff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4c85f43f09..d706e38ffc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -401,6 +401,19 @@ exec_command(const char *cmd,
 		status = exec_command_shell_escape(scan_state, active_branch);
 	else if (strcmp(cmd, "?") == 0)
 		status = exec_command_slash_command_help(scan_state, active_branch);
+	else if (strcmp(cmd, "until-0") == 0) {
+		status = PSQL_CMD_SKIP_LINE;
+		pset.is_in_until = ftell(pset.cur_cmd_source);
+		pset.has_affected_rows = false;
+	} else if (strcmp(cmd, "end-until") == 0) {
+		status = PSQL_CMD_SKIP_LINE;
+		if (pset.has_affected_rows) {
+			fseek(pset.cur_cmd_source, pset.is_in_until, SEEK_SET);
+			pset.has_affected_rows = false;
+		} else {
+			pset.is_in_until = 0;
+		}
+	}
 	else
 		status = PSQL_CMD_UNKNOWN;
 
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index c06ce3ca09..869dbf6dcd 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -430,6 +430,12 @@ MainLoop(FILE *source)
 {
 	success = SendQuery(query_buf->data);
 	slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
+	if (success && pset.is_in_until && !pset.has_affected_rows)
+	{
+		const char *row_count = GetVariable(pset.vars, "ROW_COUNT");
+		if (row_count != NULL && strcmp(row_count, "0") != 0)
+			pset.has_affected_rows = true;
+	}
 	pset.stmt_lineno = 1;
 
 	/* transfer query to previous_buf by pointer-swapping */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 69e617e6b5..f3f92fe899 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -139,6 +139,9 @@ typedef struct _psqlSettings
 	const char *prompt3;
 	PGVerbosity verbosity;		/* current error verbosity level */
 	PGContextVisibility show_context;	/* current context display level */
+	
+	long		is_in_until;
+	int			has_affected_rows;
 } PsqlSettings;
 
 extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index be57574cd3..beba8851a3 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -134,6 +134,8 @@ main(int argc, char *argv[])
 	pset.last_error_result = NULL;
 	pset.cur_cmd_source = stdin;
 	pset.cur_cmd_interactive = false;
+	pset.is_in_until = 0;
+	pset.has_affected_rows = false;
 
 	/* We rely on unmentioned fields of pset.popt to start out 0/false/NULL */
 	pset.popt.topt.format = PRINT_ALIGNED;


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-24 Thread Michael Paquier
On Sat, Apr 21, 2018 at 12:25:27PM +1200, Thomas Munro wrote:
> Here's a new version, because FreeBSD's new interface changed slightly.

I have been looking at the proposed set for Linux, and the numbers are
here.  By replaying 1GB worth of WAL after a pgbench run with the data
folder on a tmpfs the recovery time goes from 33s to 28s, so that's a
nice gain.

Do you have numbers with FreeBSD?  I get that this would be more
difficult to set up without a GA release perhaps...

I can also see the difference in profiles by looking for
HandleStartupProcInterrupts which gets close 10% of the attention when
unpatched, and down to 0.1% when patched.

@@ -2484,6 +2484,8 @@ ClosePostmasterPorts(bool am_syslogger)
 if (bonjour_sdref)
 close(DNSServiceRefSockFD(bonjour_sdref));
 #endif
+
+PostmasterDeathInit();

Thomas, trying to understand here...  Why this place for the signal
initialization?  Wouldn't InitPostmasterChild() be a more logical place
as we'd want to have this logic caught by all other processes?
--
Michael


signature.asc
Description: PGP signature


Re: "could not reattach to shared memory" on buildfarm member dory

2018-04-24 Thread Magnus Hagander
On Tue, Apr 24, 2018 at 1:18 AM, Stephen Frost  wrote:

> Greetings,
>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > So far, dory has failed three times with essentially identical symptoms:
> >
> > 2018-04-23 19:57:10.624 GMT [2240] FATAL:  could not reattach to shared
> memory (key=0190, addr=018E): error code 487
> > 2018-04-23 15:57:10.657 EDT [8836] ERROR:  lost connection to parallel
> worker
> > 2018-04-23 15:57:10.657 EDT [8836] STATEMENT:  select count(*) from
> tenk1 group by twenty;
> > 2018-04-23 15:57:10.660 EDT [3820] LOG:  background worker "parallel
> worker" (PID 2240) exited with exit code 1
> >
> > Now how can this be?  We've successfully reserved and released the
> address
> > range we want to use, so it *should* be free at the instant we try to
> map.
>
> Yeah, that's definitely interesting.
>
> > I guess the good news is that we're seeing this in a reasonably
> > reproducible fashion, so there's some hope of digging down to find
> > out the actual cause.
>
> I've asked Heath to take a look at the system again and see if there's
> any Windows logs or such that might help us understand what's happening.
> AV was disabled on the box, so don't think it's that, at least.
>

Disabled or uninstalled?

Back when I was combating windows AV on a daily basis, this normally did
not have the same effect. Just disabling the AV didn't actually remove the
parts that caused issues, it just hid them. Actual uninstall is what was
required.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: "could not reattach to shared memory" on buildfarm member dory

2018-04-24 Thread Noah Misch
On Tue, Apr 24, 2018 at 11:37:33AM +1200, Thomas Munro wrote:
> On Tue, Apr 24, 2018 at 11:18 AM, Stephen Frost  wrote:
> > Greetings,
> >
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> So far, dory has failed three times with essentially identical symptoms:
> >>
> >> 2018-04-23 19:57:10.624 GMT [2240] FATAL:  could not reattach to shared 
> >> memory (key=0190, addr=018E): error code 487
> >> 2018-04-23 15:57:10.657 EDT [8836] ERROR:  lost connection to parallel 
> >> worker
> >> 2018-04-23 15:57:10.657 EDT [8836] STATEMENT:  select count(*) from tenk1 
> >> group by twenty;
> >> 2018-04-23 15:57:10.660 EDT [3820] LOG:  background worker "parallel 
> >> worker" (PID 2240) exited with exit code 1
> >>
> >> Now how can this be?  We've successfully reserved and released the address
> >> range we want to use, so it *should* be free at the instant we try to map.
> >
> > Yeah, that's definitely interesting.
> 
> I wondered if another thread with the right timing could map something
> between the VirtualFree() and MapViewOfFileEx() calls, but  we don't
> create the Windows signal handling thread until a bit later.  Could
> there be any any other threads active in the process?
> 
> Maybe try asking what's mapped there with VirtualQueryEx() on failure?

+1.  An implementation of that:
https://www.postgresql.org/message-id/20170403065106.GA2624300%40tornado.leadboat.com



Re: [HACKERS] Clock with Adaptive Replacement

2018-04-24 Thread Юрий Соколов
вт, 24 апр. 2018 г., 8:04 Andrey Borodin :

> Hi, Thomas!
>
> > 24 апр. 2018 г., в 2:41, Thomas Munro 
> написал(а):
> >
> > On Fri, Feb 12, 2016 at 10:02 AM, Konstantin Knizhnik
> >  wrote:
> >> Are there some well known drawbacks of this approach or it will be
> >> interesting to adopt this algorithm to PostgreSQL and measure it impact
> om
> >> performance under different workloads?
> >
> > I'm not currently planning to work in this area and have done no real
> > investigation, so please consider the following to be "water cooler
> > talk".
>
> I've intention to make some prototypes in this area, but still I hadn't
> allocated any time chunks sufficient enough to make anything usefull.
>
> I think that replacement of current CS5 will:
> 1. allow use of big shared buffers
> 2. make DIRECT_IO realistic possibility
> 3. improve BgWriter
> 4. unify different buffering strategies into single buffer manager (there
> will be no need in placing VACUUM into special buffer ring)
> 5. finally allow aio and more efficient prefetching like [0]
>
> Here's what we have about size of shared buffer now [1] (taken from [2]).
> I believe right hill must be big enough to reduce central pit to zero and
> make function monotonic: OS page cache knows less about data blocks and is
> expected to be less efficient.
>
>
> I'm not sure CART is the best possibility, though.
> I think that the right way is to implement many prototypes with LRU, ARC,
> CAR, CART, and 2Q. Then, benchmark them well. Or even make this algorithm
> pluggable? But currently we have a lot of dependent parts in the system. I
> do not even know where to start.
>
>
> Best regards, Andrey Borodin.
>
>
> [0]
> http://diku.dk/forskning/Publikationer/tekniske_rapporter/2004/04-03.pdf
> [1]
> https://4.bp.blogspot.com/-_Zz6X-e9_ok/WlaIvpStBmI/AA4/E1NwV-_82-oS5KfmyjoOff_IxUXiO96WwCLcBGAs/s1600/20180110-PTI.png
> [2] http://blog.dataegret.com/2018/01/postgresql-performance-meltdown.html


Before implementing algorithms within PostgreSQL it will be great to test
them outside with real traces.

I think, there should be mechamism to collect traces from real-world
postgresql instalations: every read and write access. It should be
extremely eficient to be enabled in real world. Something like circular
buffer in shared memory, and separate worker to dump it to disk.
Instead of full block address, 64bit hash could be used. Even 63bit + 1bit
to designate read/write access.

Using these traces, it will be easy to choose couple of "theoretically"
best algorithms, and then attempt to implement them.

With regards,
Yura