Re: How is the PostgreSQL debuginfo file generated

2017-11-23 Thread 高增琦
A little trick:

1. configure with —enable-debug, run “make install”, use this build result
as debuginfo

2. then run “make install-strip”, use this result as release

However the it is not the regular debuginfo, you still can call gdb with
it. Additionally, you can dump the real debuginfo from it later.

Craig Ringer 于2017年11月24日 周五10:04写道:

> On 23 November 2017 at 18:38, Rui Hai Jiang  wrote:
>
>> Hello hackers,
>>
>>
>>
>> I’m wondering how to build the debuginfo package from  the PostgreSQL
>> source.
>>
>>
>>
>> For example to generate something like this one :
>> postgresql91-debuginfo.x86_64.
>>
>>
>>
>> Is there existing support in the current PostgreSQL Makefiles to generate
>> such debuginfo? I have searched in the source code and could find anything.
>>   How were the existing debuginfo packages created?
>>
>>
>>
> When you're building from source, no separate debuginfo is produced. If
> you build with --enable-debug, the debuginfo is embedded in the binaries.
> The resulting binaries are very large but not generally any slower.
>
> Packaging systems have helper programs that invoke 'strip' (or sometimes
> objcopy) to split out the debuginfo into a separate file and remove it from
> the original executable.  See
> https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html  and
> https://fedoraproject.org/wiki/Packaging:Debuginfo?rd=Packaging/Debuginfo
> .
>
> You cannot generally re-create debuginfo to match a given binary package.
> You'd have to rebuild in an absolutely identical environment: exact same
> library versions, compiler version, configure flags, etc etc etc. Otherwise
> you'll get broken/misleading debuginfo.
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com


Re: [HACKERS] logical decoding of two-phase transactions

2017-11-23 Thread Craig Ringer
On 24 November 2017 at 13:44, Nikhil Sontakke 
wrote:

>
> > This could create an inconsistent view of the catalogs if our prepared
> txn
> > did any DDL. For example, we might've updated a pg_class row, so we
> created
> > a new row and set xmax on the old one. Vacuum may merrily remove our new
> row
> > so there's no way we can find the correct data anymore, we'd have to find
> > the outdated row or no row. By my reading of HeapTupleSatisfiesMVCC we'll
> > see the old pg_class tuple.
> >
> > Similar issues apply for pg_attribute etc etc. We might try to decode a
> > record according to the wrong table structure because relcache lookups
> > performed by the plugin will report outdated information.
> >
>
> We actually do the decoding in a PG_TRY/CATCH block, so if there are
> any errors we
> can clean those up in the CATCH block. If it's a prepared transaction
> then we can send
> an ABORT to the remote side to clean itself up.
>

Yeah. I suspect it might not always ERROR gracefully though.


> > How practical is adding a lock class?
>
> Am open to suggestions. This looks like it could work decently.


It looks amazingly simple from here. Which probably means there's more to
it that I haven't seen yet. I could use advice from someone who knows the
locking subsystem better.


> Yes, that makes sense in case of ROLLBACK. If we find a missing GID
> for a COMMIT PREPARE we are in for some trouble.
>

I agree. But it's really down to the apply worker / plugin to set policy
there, I think. It's not the 2PC decoding support's problem.

I'd argue that a plugin that wishes to strictly track and match 2PC aborts
with the subsequent ROLLBACK PREPARED could do so by recording the abort
locally. It need not rely on faked-up 2pc xacts from the output plugin.
Though it might choose to create them on the downstream as its method of
tracking aborts.

In other words, we don't need the logical decoding infrastructure's help
here. It doesn't have to fake up 2PC xacts for us. Output plugins/apply
workers that want to can do it themselves, and those that don't can ignore
rollback prepared for non-matched GIDs instead.

> We'd need a race-free way to do that though, so I think we'd have to
> extend
> > FinishPreparedTransaction and LockGXact with some kind of missing_ok
> flag. I
> > doubt that'd be controversial.
> >
>
> Sure.


I reckon that should be in-scope for this patch, and pretty clearly useful.
Also simple.


>
> > - It's really important that the hook that decides whether to decode an
> xact
> > at prepare or commit prepared time reports the same answer each and every
> > time, including if it's called after a prepared txn has committed. It
> > probably can't look at anything more than the xact's origin replica
> > identity, xid, and gid. This also means we need to know the gid of
> prepared
> > txns when processing their commit record, so we can tell logical decoding
> > whether we already sent the data to the client at prepare-transaction
> time,
> > or if we should send it at commit-prepared time instead.
> >
>
> We already have a filter_prepare_cb hook in place for this. TBH, I
> don't think this patch needs to worry about the internals of that
> hook. Whatever it returns, if it returns the same value everytime then
> we should be good from the logical decoding perspective.
>

I agree. I meant that it  should try to pass only info that's accessible at
both PREPARE TRANSACTION and COMMIT PREPARED time, and we should document
the importance of returning a consistent result. In particular, it's always
wrong to examine the current twophase state when deciding what to return.


> I think, if we encode the logic in the GID itself, then it will be
> good and consistent everytime. For example, if the hook sees a GID
> with the prefix '_$Logical_', then it knows it has to PREPARE it.
> Others can be decoded at commit time.
>

Yep. We can also safely tell the hook:

* the xid
* whether the xact has made catalog changes (since we know that at prepare
and commit time)

but probably not much else.


> > - You need to flush the syscache when you finish decoding a PREPARE
> > TRANSACTION of an xact that made catalog changes, unless it's immediately
> > followed by COMMIT PREPARED of the same xid. Because xacts between the
> two
> > cannot see changes the prepared xact made to the catalogs.
> >
> > - For the same reason we need to ensure that the historic snapshot used
> to
> > decode a 2PC xact that made catalog changes isn't then used for
> subsequent
> > xacts between the prepare and commit. They'd see the uncommitted
> catalogs of
> > the prepared xact.
> >
>
> Yes, we will do TeardownHistoricSnapshot and syscache flush as part of
> the cleanup for 2PC transactions.
>

Great.

Thanks.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread Craig Ringer
On 24 November 2017 at 12:38, atorikoshi  wrote:

>
>
> On 2017/11/24 10:57, Craig Ringer wrote:
> > On 24 November 2017 at 09:20, atorikoshi  co.jp
> >> wrote:
> >
> >>
> >> Thanks for letting me know.
> >> I think I only tested running "make check" at top directory, sorry
> >> for my insufficient test.
> >>
> >> The test failure happened at the beginning of replication(creating
> >> slot), so there are no changes yet and getting the tail of changes
> >> failed.
> >>
> >> Because the bug we are fixing only happens after creating files,
> >> I've added "txn->serialized" to the if statement condition.
> >
> >
> > Thanks.
> >
> > Re-reading the patch I see
> >
> >  * The final_lsn of which transaction that hasn't produced an abort
> >  * record is invalid.
> >
> > which I find very hard to parse. I suggest:
> >
> >  We set final_lsn when we decode the commit or abort record for a
> > transaction,
> >  but transactions implicitly aborted by a crash never write such a
> record.
> >
> > then continue from there with the same text as in the patch.
> >
> > Otherwise I'm happy. It passes make check, test decoding and the recovery
> > TAP tests too.
> >
>
> Thanks for your kind suggestion and running test.
> I've added your comment at the beginning.
>
>
Marked ready for committer.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] UPDATE of partition key

2017-11-23 Thread Amit Langote
On 2017/11/23 21:57, Amit Khandekar wrote:
> If we collect the partition keys in expand_partitioned_rtentry(), we
> need to pass the root relation also, so that we can convert the
> partition key attributes to root rel descriptor. And the other thing
> is, may be, we can check beforehand (in expand_inherited_rtentry)
> whether the rootrte's updatedCols is empty, which I think implies that
> it's not an UPDATE operation. If yes, we can just skip collecting the
> partition keys.

Yeah, it seems like a good idea after all to check in
expand_inherited_rtentry() whether the root RTE's updatedCols is non-empty
and if so check if any of the updatedCols are partition keys.  If we find
some, then it will suffice to just set a simple flag in the
PartitionedChildRelInfo that will be created for the root table.  That
should be done *after* we have visited all the tables in the partition
tree including some that might be partitioned and hence will provide their
partition keys.  The following block in expand_inherited_rtentry() looks
like a good spot:

if (rte->inh && partitioned_child_rels != NIL)
{
PartitionedChildRelInfo *pcinfo;

pcinfo = makeNode(PartitionedChildRelInfo);

Thanks,
Amit




Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-23 Thread Ashutosh Bapat
On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
 wrote:
>
>
> Agree. However, there was ab existing comment in create_append_path() saying
> "We don't bother with inventing a cost_append(), but just do it here", which
> implies at sometime in future we may need it; so why not now where we are
> explicitly costing for an append node. Having a function is good so that, if
> required in future, we need update in only this function.
> Let me know if you think otherwise, I make those changes in next patchset.
>

I don't read that comment as something we will do it in future. I
don't think the amount of changes that this patch introduces just for
adding one more line of code aren't justified. There's anyway only one
place where we are costing append, so it's not that the new function
avoids code duplication. Although I am happy to defer this to the
committer, if you think that we need a separate function.

>
> As suggested by Robert, I have renamed it to APPEND_CPU_COST_MULTIPLIER in
> v7 patchset.
> Also, retained the #define for just multiplier as suggested by Robert.

Ok.


>
>>
>>
>> Anyway, it doesn't look like a good idea to pass an argument (gd) only to
>> return from that function in case of its presence. May be we should handle
>> it
>> outside this function.
>
>
> Well, I would like to have it inside the function itself. Let the function
> itself do all the necessary checking rather than doing some of them outside.

We will leave this to the committer. I don't like that style, but it's
also good to expect a function to do all related work.


>>
>> +if (!create_child_grouping_paths(root, input_child_rel,
>> agg_costs, gd,
>> + ))
>> +{
>> +/* Could not create path for childrel, return */
>> +pfree(appinfos);
>> +return;
>> +}
>>
>> Can we detect this condition and bail out even before planning any of the
>> children? It looks wasteful to try to plan children only to bail out in
>> this
>> case.
>
>
> I don't think so. It is like non-reachable and added just for a safety in
> case we can't able to create a child path. The bail out conditions cannot be
> evaluated at the beginning. Do you this an Assert() will be good here? Am I
> missing something?

An Assert would help. If it's something that should not happen, we
should try catching that rather that silently ignoring it.

>
>>
>> +/* Nothing to do if we have no live children */
>> +if (live_children == NIL)
>> +return;
>>
>> A parent relation with all dummy children will also be dummy. May be we
>> should
>> mark the parent dummy case using mark_dummy_rel() similar to
>> generate_partition_wise_join_paths().
>
>
> If parent is dummy, then we are not at all doing PWA. So no need to mark
> parent grouped_rel as dummy I guess.
> However, if some of the children are dummy, I am marking corresponding upper
> rel as dummy too.
> Actually, this condition will never going to be true as you said correctly
> that "A parent relation with all dummy children will also be dummy". Should
> we have an Assert() instead?

Yes.


>
>
> I have testcase for multi-level partitioned table.
> However, I did not understand by what you mean by "children with different
> order of partition key columns". I had a look over tests in
> partition_join.sql and it seems that I have cover all those scenarios.
> Please have a look over testcases added for PWA and let me know the
> scenarios missing, I will add them then.

By children with different order of partition key columns, I meant
something like this

parent(a int, b int, c int) partition by (a), child1(b int, c int, a
int) partition by b, child1_1 (c int, a int, b int);

where the attribute numbers of the partition keys in different
children are different.

>> This looks like a useful piece of general functionality
>> list_has_intersection(), which would returns boolean instead of the whole
>> intersection. I am not sure whether we should add that function to list.c
>> and
>> use here.
>
>
> Sounds good.
> But for now, I am keeping it as part of this feature itself.

ok

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] HASH_CHUNK_SIZE vs malloc rounding

2017-11-23 Thread Thomas Munro
On Fri, Nov 24, 2017 at 4:54 PM, Tom Lane  wrote:
>> It doesn't seem that crazy to expose aset.c's overhead size to client
>> code does it?  Most client code wouldn't care, but things that are
>> doing something closer to memory allocator work themselves like
>> dense_alloc could care.  It could deal with its own overhead itself,
>> and subtract aset.c's overhead using a macro.
>
> Seeing that we now have several allocators with different overheads,
> I think that exposing this directly to clients is exactly what not to do.
> I still like the idea I sketched above of a context-type-specific function
> to adjust a request size to something that's efficient.
>
> But there's still the question of how do we know what's an efficient-sized
> malloc request.  Is there good reason to suppose that powers of 2 are OK?

It looks like current glibc versions don't do that sort of thing until
you reach M_MMAP_THRESHOLD (default 128kB) and then starts working in
4kb pages.  Before that its manual says that it doesn't do any
rounding beyond alignment[1].  I haven't come across any other
allocator that is so forgiving, but I haven't checked Illumos, AIX or
Windows.  Only the first of those has source code available, but they
confuse you by shipping a bunch of different allocators in their tree.
I didn't check the other BSDs.  tcmalloc[2], jemalloc, macOS malloc
round up to (respectively) 4k page, 8kb size class, 512 byte size
class when you allocate 32kb + 1 byte.

[1] https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html
[2] http://goog-perftools.sourceforge.net/doc/tcmalloc.html

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread atorikoshi



On 2017/11/24 10:57, Craig Ringer wrote:
> On 24 November 2017 at 09:20, atorikoshi 
> wrote:
>
>>
>> Thanks for letting me know.
>> I think I only tested running "make check" at top directory, sorry
>> for my insufficient test.
>>
>> The test failure happened at the beginning of replication(creating
>> slot), so there are no changes yet and getting the tail of changes
>> failed.
>>
>> Because the bug we are fixing only happens after creating files,
>> I've added "txn->serialized" to the if statement condition.
>
>
> Thanks.
>
> Re-reading the patch I see
>
>  * The final_lsn of which transaction that hasn't produced an abort
>  * record is invalid.
>
> which I find very hard to parse. I suggest:
>
>  We set final_lsn when we decode the commit or abort record for a
> transaction,
>  but transactions implicitly aborted by a crash never write such a 
record.

>
> then continue from there with the same text as in the patch.
>
> Otherwise I'm happy. It passes make check, test decoding and the recovery
> TAP tests too.
>

Thanks for your kind suggestion and running test.
I've added your comment at the beginning.

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 0f607ba..ee28d26 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1724,6 +1724,22 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId 
oldestRunningXid)
 
if (TransactionIdPrecedes(txn->xid, oldestRunningXid))
{
+   /*
+* We set final_lsn when we decode the commit or abort 
record for
+* a transaction, but transactions implicitly aborted 
by a crash
+* never write such a record.
+* The final_lsn of which transaction that hasn't 
produced an abort
+* record is invalid. To ensure cleanup the serialized 
changes of
+* such transaction we set the LSN of the last change 
action to
+* final_lsn.
+*/
+   if (txn->serialized && txn->final_lsn == 0)
+   {
+   ReorderBufferChange *last_change =
+   dlist_tail_element(ReorderBufferChange, node, 
>changes);
+   txn->final_lsn = last_change->lsn;
+   }
+
elog(DEBUG2, "aborting old transaction %u", txn->xid);
 
/* remove potential on-disk data, and deallocate this 
tx */
diff --git a/src/include/replication/reorderbuffer.h 
b/src/include/replication/reorderbuffer.h
index 86effe1..7931757 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -168,6 +168,8 @@ typedef struct ReorderBufferTXN
 * * plain abort record
 * * prepared transaction abort
 * * error during decoding
+* Note that this can also be a LSN of the last change action of this 
xact
+* if it is an implicitly aborted transaction.
 * 
 */
XLogRecPtr  final_lsn;


Re: [HACKERS] SERIALIZABLE with parallel query

2017-11-23 Thread Haribabu Kommi
On Tue, Sep 26, 2017 at 4:41 PM, Haribabu Kommi 
wrote:

>
>
> On Mon, Sep 25, 2017 at 6:57 PM, Thomas Munro <
> thomas.mu...@enterprisedb.com> wrote:
>
>> On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
>>  wrote:
>> > After I tune the GUC to go with sequence scan, still I am not getting
>> the
>> > error
>> > in the session-2 for update operation like it used to generate an error
>> for
>> > parallel
>> > sequential scan, and also it even takes some many commands until unless
>> the
>> > S1
>> > commits.
>>
>> Hmm.  Then this requires more explanation because I don't expect a
>> difference.  I did some digging and realised that the error detail
>> message "Reason code: Canceled on identification as a pivot, during
>> write." was reached in a code path that requires
>> SxactIsPrepared(writer) and also MySerializableXact == writer, which
>> means that the process believes it is committing.  Clearly something
>> is wrong.  After some more digging I realised that
>> ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
>> CommitTransaction() which calls
>> PreCommit_CheckForSerializationFailure().  Since the worker is
>> connected to the leader's SERIALIZABLEXACT, that finishes up being
>> marked as preparing to commit (not true!), and then the leader get
>> confused during that write, causing a serialization failure to be
>> raised sooner (though I can't explain why it should be raised then
>> anyway, but that's another topic).  Oops.  I think the fix here is
>> just not to do that in a worker (the worker's CommitTransaction()
>> doesn't really mean what it says).
>>
>> Here's a version with a change that makes that conditional.  This way
>> your test case behaves the same as non-parallel mode.
>>
>
> The patch looks good, and I don't have any comments for the code.
> The test that is going to add by the patch is not generating a true
> parallelism scenario, I feel it is better to change the test that can
> generate a parallel sequence/index/bitmap scan.
>


The latest patch is good. It lacks a test that verifies the serialize
support with actual parallel workers, so in case if it broken, it is
difficult
to know.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] logical decoding of two-phase transactions

2017-11-23 Thread Craig Ringer
On 23 November 2017 at 20:27, Nikhil Sontakke 
wrote:


>
> Is there any other locking scenario that we need to consider?
> Otherwise, are we all ok on this point being a non-issue for 2PC
> logical decoding?
>

Yeah.

I didn't find any sort of sensible situation where locking would pose
issues. Unless you're taking explicit LOCKs on catalog tables, you should
be fine.

There may be issues with CLUSTER or VACUUM FULL of non-relmapped catalog
relations I guess. Personally I put that in the "don't do that" box, but if
we really have to guard against it we could slightly expand the limits on
which txns you can PREPARE to any txn that has a strong lock on a catalog
relation.


> The issue is with a concurrent rollback of the prepared transaction.
> We need a way to ensure that
> the 2PC does not abort when we are in the midst of a change record
> apply activity.
>

The *reason* we care about this is that tuples created by aborted txns are
not considered "recently dead" by vacuum. They can be marked invalid and
removed immediately due to hint-bit setting and HOT pruning, vacuum runs,
etc.

This could create an inconsistent view of the catalogs if our prepared txn
did any DDL. For example, we might've updated a pg_class row, so we created
a new row and set xmax on the old one. Vacuum may merrily remove our new
row so there's no way we can find the correct data anymore, we'd have to
find the outdated row or no row. By my reading of HeapTupleSatisfiesMVCC
we'll see the old pg_class tuple.

Similar issues apply for pg_attribute etc etc. We might try to decode a
record according to the wrong table structure because relcache lookups
performed by the plugin will report outdated information.

The sanest option here seems to be to stop the txn from aborting while
we're decoding it, hence Nikhil's suggestions.


> * We introduce two new booleans in the TwoPhaseState
> GlobalTransactionData structure.
>   bool beingdecoded;
>   bool abortpending;
>

I think it's premature to rule out the simple option of doing a LockGXact
when we start decoding. Improve the error "prepared transaction with
identifier \"%s\" is busy" to report the locking pid too. It means you
cannot rollback or commit a prepared xact while it's being decoded, but for
the intended use of this patch, I think that's absolutely fine anyway.

But I like your suggestion much more than the idea of taking a LWLock on
TwoPhaseStateLock while decoding a record. Lots of LWLock churn, and
LWLocks held over arbitrary user plugin code. Not viable.

With your way we just have to take a LWLock once on TwoPhaseState when we
test abortpending and set beingdecoded. After that, during decoding, we can
do unlocked tests of abortpending, since a stale read will do nothing worse
than delay our response a little. The existing 2PC ops already take the
LWLock and can examine beingdecoded then. I expect they'd need to WaitLatch
in a loop until beingdecoded was cleared, re-acquiring the LWLock and
re-checking each time it's woken. We should probably add a field there for
a waiter proc that wants its latch set, so 2pc ops don't usually have to
poll for decoding to finish. (Unless condition variables will help us here?)

However, let me make an entirely alternative suggestion. Should we add a
heavyweight lock class for 2PC xacts instead, and leverage the existing
infrastructure? We already use transaction locks widely after all. That
way, we just take some kind of share lock on the 2PC xact by xid when we
start logical decoding of the 2pc xact. ROLLBACK PREPARED and COMMIT
PREPARED would acquire the same heavyweight lock in an exclusive mode
before grabbing TwoPhaseStateLock and doing their work.

That way we get automatic cleanup when decoding procs exit, we get wakeups
for waiters, etc, all for "free".

How practical is adding a lock class?

(Frankly I've often wished I could add new heavyweight lock classes when
working on complex extensions like BDR, too, and in an ideal world we'd be
able to register lock classes for use by extensions...)


3) Before starting decode of the next change record, we re-check if
> "abortpending" is set. If "abortpending"
> is set, we do not decode the next change record. Thus the abort is
> delay-bounded to a maximum of one change record decoding/apply cycle
> after we signal our intent to abort it. Then, we need to send ABORT
> (regular, not rollback prepared, since we have not sent "PREPARE" yet.
>

Just to be explicit, this means "tell the downstream the xact has aborted".
Currently logical decoding does not ever start decoding an xact until it's
committed, so it has never needed an abort callback on the output plugin
interface.

But we'll need one when we start doing speculative logical decoding of big
txns before they commit, and we'll need it for this. It's relatively
trivial.


> This abort_prepared callback will abort the dummy PREPARED query from
>
step (3) above. Instead of doing this, we could actually check if the

Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-11-23 Thread Amit Langote
Thank you Simon for the review.

On 2017/11/20 7:33, Simon Riggs wrote:
> On 2 August 2017 at 00:56, Amit Langote  wrote:
> 
>> The patch's job is simple:
> 
> Patch no longer applies and has some strange behaviours.
> 
>> - Remove the check in the parser that causes an error the moment the
>>   ON CONFLICT clause is found.
> 
> Where is the check and test that blocks ON CONFLICT UPDATE?

That check is in transformInsertStmt() and following is the diff from the
patch that removes it:

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 757a4a8fd1..d680d2285c 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -847,16 +847,8 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)

 /* Process ON CONFLICT, if any. */
 if (stmt->onConflictClause)
-{
-/* Bail out if target relation is partitioned table */
-if (pstate->p_target_rangetblentry->relkind ==
RELKIND_PARTITIONED_TABLE)
-ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("ON CONFLICT clause is not supported with
partitioned tables")));
-

ON CONFLICT UPDATE will result in an error because it requires specifying
a conflict_target.  Specifying conflict_target will always result in an
error as things stand today, because planner looks for a constraint/index
covering the same and partitioned tables cannot have one.

> My understanding was the patch was intended to only remove the error
> in case of DO NOTHING.

To be precise, the patch is intended to remove the error for the cases
which don't require specifying conflict_target.  In INSERT's
documentation, conflict_target is described as follows:

"Specifies which conflicts ON CONFLICT takes the alternative action on by
choosing arbiter indexes. Either performs unique index inference, or names
a constraint explicitly. For ON CONFLICT DO NOTHING, it is optional to
specify a conflict_target; when omitted, conflicts with all usable
constraints (and unique indexes) are handled. For ON CONFLICT DO UPDATE, a
conflict_target must be provided."

Note the "For ON CONFLICT DO NOTHING, it is optional to specify a
conflict_target".  That's the case that this patch intends to prevent
error for, that is, the error won't occur if no conflict_target is specified.

>> - Fix leaf partition ResultRelInfo initialization code so that the call
>>   ExecOpenIndices() specifies 'true' for speculative, so that the
>>   information necessary for conflict checking will be initialized in the
>>   leaf partition's ResultRelInfo
> 
> Why? There is no caller that needs information.

It is to be used if and when ExecInsert() calls
ExecCheckIndexConstraints() in the code path to handle ON CONFLICT DO
NOTHING that we're intending to support in some cases.  Note that it will
only check conflicts for the individual leaf partitions using whatever
constraint-enforcing indexes they might have.

Example:

create table foo (a int) partition by list (a);
create table foo123 partition of foo (a unique) for values in (1, 2, 3);
\d foo123
   Table "public.foo123"
 Column |  Type   | Collation | Nullable | Default
=---+-+---+--+-
 a  | integer |   |  |
Partition of: foo FOR VALUES IN (1, 2, 3)
Indexes:
"foo123_a_key" UNIQUE CONSTRAINT, btree (a)


Without the patch, parser itself will throw an error:

insert into foo values (1)
on conflict do nothing returning tableoid::regclass, *;
ERROR:  ON CONFLICT clause is not supported with partitioned tables


After applying the patch:

insert into foo values (1)
on conflict do nothing returning tableoid::regclass, *;
 tableoid | a
=-+---
 foo123   | 1
(1 row)

INSERT 0 1

insert into foo values (1) on conflict do nothing returning
tableoid::regclass, *;
 tableoid | a
=-+---
(0 rows)

INSERT 0 0

That worked because no explicit conflict target was specified.  If it is
specified, planner (plancat.c: infer_arbiter_indexes()) will throw an
error, because it cannot find a constraint on foo (which is a partitioned
table).

insert into foo values (1)
on conflict (a) do nothing returning tableoid::regclass, *;
ERROR:  there is no unique or exclusion constraint matching the ON
CONFLICT specification

We'll hopefully able to support specifying conflict_target after the
Alvaro's work at [1] is finished.

Meanwhile, rebased patch is attached.

Thanks,
Amit

[1] https://commitfest.postgresql.org/15/1365/




Re: How is the PostgreSQL debuginfo file generated

2017-11-23 Thread Craig Ringer
On 23 November 2017 at 18:38, Rui Hai Jiang  wrote:

> Hello hackers,
>
>
>
> I’m wondering how to build the debuginfo package from  the PostgreSQL
> source.
>
>
>
> For example to generate something like this one :
> postgresql91-debuginfo.x86_64.
>
>
>
> Is there existing support in the current PostgreSQL Makefiles to generate
> such debuginfo? I have searched in the source code and could find anything.
>   How were the existing debuginfo packages created?
>
>
>
When you're building from source, no separate debuginfo is produced. If you
build with --enable-debug, the debuginfo is embedded in the binaries. The
resulting binaries are very large but not generally any slower.

Packaging systems have helper programs that invoke 'strip' (or sometimes
objcopy) to split out the debuginfo into a separate file and remove it from
the original executable.  See
https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html  and
https://fedoraproject.org/wiki/Packaging:Debuginfo?rd=Packaging/Debuginfo .

You cannot generally re-create debuginfo to match a given binary package.
You'd have to rebuild in an absolutely identical environment: exact same
library versions, compiler version, configure flags, etc etc etc. Otherwise
you'll get broken/misleading debuginfo.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread Craig Ringer
On 24 November 2017 at 09:20, atorikoshi  wrote:

>
> Thanks for letting me know.
> I think I only tested running "make check" at top directory, sorry
> for my insufficient test.
>
> The test failure happened at the beginning of replication(creating
> slot), so there are no changes yet and getting the tail of changes
> failed.
>
> Because the bug we are fixing only happens after creating files,
> I've added "txn->serialized" to the if statement condition.


Thanks.

Re-reading the patch I see

 * The final_lsn of which transaction that hasn't produced an abort
 * record is invalid.

which I find very hard to parse. I suggest:

 We set final_lsn when we decode the commit or abort record for a
transaction,
 but transactions implicitly aborted by a crash never write such a record.

then continue from there with the same text as in the patch.

Otherwise I'm happy. It passes make check, test decoding and the recovery
TAP tests too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] HASH_CHUNK_SIZE vs malloc rounding

2017-11-23 Thread Thomas Munro
On Tue, Nov 29, 2016 at 6:27 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> I bet other allocators also do badly with "32KB plus a smidgen".  To
>> minimise overhead we'd probably need to try to arrange for exactly
>> 32KB (or some other power of 2 or at least factor of common page/chunk
>> size?) to arrive into malloc, which means accounting for both
>> nodeHash.c's header and aset.c's headers in nodeHash.c, which seems a
>> bit horrible.  It may not be worth doing anything about.
>
> Yeah, the other problem is that without a lot more knowledge of the
> specific allocator, we shouldn't really assume that it's good or bad with
> an exact-power-of-2 request --- it might well have its own overhead.
> It is an issue though, and not only in nodeHash.c.  I'm pretty sure that
> StringInfo also makes exact-power-of-2 requests for no essential reason,
> and there are probably many other places.
>
> We could imagine providing an mmgr API function along the lines of "adjust
> this request size to the nearest thing that can be allocated efficiently".
> That would avoid the need for callers to know about aset.c overhead
> explicitly.  I'm not sure how it could deal with platform-specific malloc
> vagaries though :-(

Someone pointed out to me off-list that jemalloc's next size class
after 32KB is in fact 40KB by default[1].  So PostgreSQL uses 25% more
memory for hash joins than it thinks it does on some platforms.  Ouch.

It doesn't seem that crazy to expose aset.c's overhead size to client
code does it?  Most client code wouldn't care, but things that are
doing something closer to memory allocator work themselves like
dense_alloc could care.  It could deal with its own overhead itself,
and subtract aset.c's overhead using a macro.

[1] https://www.freebsd.org/cgi/man.cgi?jemalloc(3)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Failed to delete old ReorderBuffer spilled files

2017-11-23 Thread atorikoshi

On 2017/11/22 16:49, Masahiko Sawada wrote:

On Wed, Nov 22, 2017 at 2:56 PM, Craig Ringer  wrote:

On 22 November 2017 at 12:15, Kyotaro HORIGUCHI
 wrote:


At Wed, 22 Nov 2017 12:57:34 +0900, Michael Paquier
 wrote in

Re: Combine function returning NULL unhandled?

2017-11-23 Thread Andres Freund
On 2017-11-21 15:51:59 -0500, Robert Haas wrote:
> On Mon, Nov 20, 2017 at 10:36 PM, Andres Freund  wrote:
> > The plain transition case contains:
> > if (pergroupstate->transValueIsNull)
> > {
> > /*
> >  * Don't call a strict function with NULL inputs.  
> > Note it is
> >  * possible to get here despite the above tests, if 
> > the transfn is
> >  * strict *and* returned a NULL on a prior cycle. 
> > If that happens
> >  * we will propagate the NULL all the way to the 
> > end.
> >  */
> > return;
> > }
> >
> > how come similar logic is not present for combine functions? I don't see
> > any checks preventing a combinefunc from returning NULL, nor do I see
> > https://www.postgresql.org/docs/devel/static/sql-createaggregate.html
> > spell out a requirement that that not be the case.
> 
> I don't know of a reason why that logic shouldn't be present for the
> combine-function case as well.  It seems like it should be pretty
> straightforward to write a test that hits that case and watch it blow
> up ... assuming it does, then I guess we should back-patch the
> addition of that logic.

Found it surprisingly not that straightforward ;)

Pushed a fix to the relevant branches, including tests of the
trans/combine functions returns NULL cases.

Greetings,

Andres Freund



Re: [HACKERS] Supporting huge pages on Windows

2017-11-23 Thread Thomas Munro
On Thu, Sep 14, 2017 at 12:32 AM, Magnus Hagander  wrote:
> It's my plan to get to this patch during this commitfest. I've been
> travelling for open and some 24/7 work so far, but hope to get CFing soon.

I hope Tsunakawa-san doesn't mind me posting another rebased version
of his patch.  The last version conflicted with the change from SGML
to XML that just landed in commit 3c49c6fa.

-- 
Thomas Munro
http://www.enterprisedb.com


win_large_pages_v15.patch
Description: Binary data


Re: documentation is now XML

2017-11-23 Thread Michael Paquier
On Fri, Nov 24, 2017 at 12:21 AM, Oleg Bartunov  wrote:
> On Thu, Nov 23, 2017 at 6:01 PM, Peter Eisentraut
>  wrote:
>> The documentation sources are now DocBook XML, not SGML.  (The files are
>> still named *.sgml.  That's something to think about separately.)
>
> Congratulations to you and Alexander !

+1.
-- 
Michael



Re: [HACKERS] Custom compression methods

2017-11-23 Thread Tomas Vondra
Hi,

On 11/23/2017 10:38 AM, Ildus Kurbangaliev wrote:
> On Tue, 21 Nov 2017 18:47:49 +0100
> Tomas Vondra  wrote:
> 
>>>   
>>
>> Hmmm, it still doesn't work for me. See this:
>>
>> test=# create extension pg_lz4 ;
>> CREATE EXTENSION
>> test=# create table t_lz4 (v text compressed lz4);
>> CREATE TABLE
>> test=# create table t_pglz (v text);
>> CREATE TABLE
>> test=# insert into t_lz4 select repeat(md5(1::text),300);
>> INSERT 0 1
>> test=# insert into t_pglz select * from t_lz4;
>> INSERT 0 1
>> test=# drop extension pg_lz4 cascade;
>> NOTICE:  drop cascades to 2 other objects
>> DETAIL:  drop cascades to compression options for lz4
>> drop cascades to table t_lz4 column v
>> DROP EXTENSION
>> test=# \c test
>> You are now connected to database "test" as user "user".
>> test=# insert into t_lz4 select repeat(md5(1::text),300);^C
>> test=# select * from t_pglz ;
>> ERROR:  cache lookup failed for compression options 16419
>>
>> That suggests no recompression happened.
> 
> Should be fixed in the attached patch. I've changed your extension a
> little bit according changes in the new patch (also in attachments).
> 

Hmm, this seems to have fixed it, but only in one direction. Consider this:

create table t_pglz (v text);
create table t_lz4 (v text compressed lz4);

insert into t_pglz select repeat(md5(i::text),300)
from generate_series(1,10) s(i);

insert into t_lz4 select repeat(md5(i::text),300)
from generate_series(1,10) s(i);

\d+

 Schema |  Name  | Type  | Owner | Size  | Description
++---+---+---+-
 public | t_lz4  | table | user  | 12 MB |
 public | t_pglz | table | user  | 18 MB |
(2 rows)

truncate t_pglz;
insert into t_pglz select * from t_lz4;

\d+

 Schema |  Name  | Type  | Owner | Size  | Description
++---+---+---+-
 public | t_lz4  | table | user  | 12 MB |
 public | t_pglz | table | user  | 18 MB |
(2 rows)

which is fine. But in the other direction, this happens

truncate t_lz4;
insert into t_lz4 select * from t_pglz;

 \d+
   List of relations
 Schema |  Name  | Type  | Owner | Size  | Description
++---+---+---+-
 public | t_lz4  | table | user  | 18 MB |
 public | t_pglz | table | user  | 18 MB |
(2 rows)

which means the data is still pglz-compressed. That's rather strange, I
guess, and it should compress the data using the compression method set
for the target table instead.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: How is the PostgreSQL debuginfo file generated

2017-11-23 Thread Tom Lane
Rui Hai Jiang  writes:
> I’m wondering how to build the debuginfo package from  the PostgreSQL source.
> For example to generate something like this one :   
> postgresql91-debuginfo.x86_64.

On platforms where such things exist, that's handled by the packaging
system, not by PG proper.  You should proceed by making a new SRPM
and building RPMs from that.

regards, tom lane



Re: has_sequence_privilege() never got the memo

2017-11-23 Thread Peter Eisentraut
On 11/22/17 22:58, Tom Lane wrote:
> Joe Conway  writes:
>> I just noticed that has_sequence_privilege() never got the memo about
>> "WITH GRANT OPTION". Any objections to the attached going back to all
>> supported versions?
> 
> That looks odd.  Patch certainly makes this case consistent with the
> rest of acl.c, but maybe there's some deeper reason?  Peter?

No I think it was just forgotten.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE:PostgreSLQ v10.1 and xlC compiler on AIX

2017-11-23 Thread REIX, Tony
Hi Michael,

Thanks for the information ! I'm not a PostgreSQL expert.

I've found the file:  ./32bit/src/test/regress/regression.diffs

Since I'm rebuilding right now, I do not have just now the file for v10.1 with 
the issue.
However, I have it for 10.0 and 9.6.6 which show the same issues. I've attached 
them to this email.

About the suggestion of adding a AIX machine with xlc v13.1 to the buildfarm, 
I'll see how this could be done.
Also, I'll try to get a more recent version of xlC v13 .


Cordialement,

Tony Reix

Bull - ATOS
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net


De : Michael Paquier [michael.paqu...@gmail.com]
Envoyé : jeudi 23 novembre 2017 14:08
À : REIX, Tony
Cc : PostgreSQL Hackers; OLIVA, PASCAL
Objet : Re: PostgreSLQ v10.1 and xlC compiler on AIX

On Thu, Nov 23, 2017 at 7:57 PM, REIX, Tony  wrote:
> We are porting PostgreSQL v10.1 on AIX (7.2 for now).
> And we have several tests failures, in 32bit and 64bit.
> We are using xlc 13.01.0003.0003 with -O2.
> Tests were 100% OK with version 9.6.2 .
>
> About 32 bit failures within v10.1, we have 3 failures including:
>  create_aggregate ... FAILED
>  aggregates   ... FAILED
>
> I've found that these 2 failures disappear when building :
>./src/backend/parser/gram.c
> without -O2 !
> However, this is a 44,498 lines file and it may take a while for finding the
> root issue.

When running regression tests and those fail, there is a file called
regressions.diff which gets generated in src/test/regress showing the
difference between the results generated and the results expected when
running the SQL queries which are part of the regression tests.
Attaching this file to this thread would help in determining what's
wrong with the regression tests.

> I invite anyone involved in porting/using PostgreSQL 10.1 on AIX to take
> part of a discussion about how to make v10.1 work perfectly on AIX.

Note that xlc 12.1 is tested with AIX machines on the buildfarms, but
there is no coverage for 13.1 visibly. It would be nice if you could
set up a buildfarm machine to catch problems way earlier.
--
Michael


regression.diffs-10.0.xlcV13
Description: regression.diffs-10.0.xlcV13


regression.diffs-9.6.6.xlcV13
Description: regression.diffs-9.6.6.xlcV13


Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)?

2017-11-23 Thread Amit Kapila
On Thu, Jun 22, 2017 at 9:06 PM, Alexander Korotkov
 wrote:
> On Wed, Jun 7, 2017 at 11:33 AM, Alexander Korotkov
>  wrote:
>>
>> On Tue, Jun 6, 2017 at 4:05 PM, Peter Eisentraut
>>  wrote:
>>>
>>> On 6/6/17 08:29, Bruce Momjian wrote:
>>> > On Tue, Jun  6, 2017 at 06:00:54PM +0800, Craig Ringer wrote:
>>> >> Tom's point is, I think, that we'll want to stay pg_upgrade
>>> >> compatible. So when we see a pg10 tuple and want to add a new page
>>> >> with a new page header that has an epoch, but the whole page is full
>>> >> so there isn't 32 bits left to move tuples "down" the page, what do we
>>> >> do?
>>> >
>>> > I guess I am missing something.  If you see an old page version number,
>>> > you know none of the tuples are from running transactions so you can
>>> > just freeze them all, after consulting the pg_clog.  What am I missing?
>>> > If the page is full, why are you trying to add to the page?
>>>
>>> The problem is if you want to delete from such a page.  Then you need to
>>> update the tuple's xmax and stick the new xid epoch somewhere.
>>>
>>> We had an unconference session at PGCon about this.  These issues were
>>> all discussed and some ideas were thrown around.  We can expect a patch
>>> to appear soon, I think.
>>
>>
>> Right.  I'm now working on splitting my large patch for 64-bit xids into
>> patchset.
>> I'm planning to post patchset in the beginning of next week.
>
>
> Work on this patch took longer than I expected.  It is still in not so good
> shape, but I decided to publish it anyway in order to not stop progress in
> this area.
> I also tried to split this patch into several.  But actually I manage to
> separate few small pieces, while most of changes are remaining in the single
> big diff.
> Long story short, patchset is attached.
>
> 0001-64bit-guc-relopt-1.patch
> This patch implements 64 bit GUCs and relation options which are used in
> further patches.
>
> 0002-heap-page-special-1.patch
> Putting xid and multixact bases into PageHeaderData would take extra 16
> bytes on index pages too.  That would be waste of space for indexes.  This
> is why I decided to put bases into special area of heap pages.
> This patch adds special area for heap pages contaning prune xid and magic
> number.  Magic number is different for regular heap page and sequence page.
>

uint16 pd_pagesize_version;
- TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
  ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
  } PageHeaderData;

Why have you moved pd_prune_xid from page header?

> 0003-64bit-xid-1.patch
> It's the major patch.  It redefines TransactionID ad 64-bit integer and
> defines 32-bit ShortTransactionID which is used for t_xmin and t_xmax.
> Transaction id comparison becomes straight instead of circular. Base values
> for xids and multixact ids are stored in heap page special.  SLRUs also
> became 64-bit and non-circular.   To be able to calculate xmin/xmax without
> accessing heap page, base values are copied into HeapTuple.  Correspondingly
> HeapTupleHeader(Get|Set)(Xmin|Xmax) becomes just
> HeapTuple(Get|Set)(Xmin|Xmax) whose require HeapTuple not just
> HeapTupleHeader.  heap_page_prepare_for_xid() is used to ensure that given
> xid fits particular page base.  If it doesn't fit then base of page is
> shifted, that could require single-page freeze.  Format for wal is changed
> in order to prevent unaligned access to TransactionId.  *_age GUCs and
> relation options are changed to 64-bit.  Forced "autovacuum to prevent
> wraparound" is removed, but there is still freeze to truncate SLRUs.
>

It seems there is no README or some detailed explanation of how all
this works like how the value of pd_xid_base is maintained.  I don't
think there are enough comments in the patch to explain the things.  I
think it will be easier to understand and review the patch if you
provide some more details either in email or in the patch.

> 0004-base-values-for-testing-1.patch
> This patch is used for testing that calculations using 64-bit bases and
> short 32-bit xid values are correct.  It provides initdb options for initial
> xid, multixact id and multixact offset values.  Regression tests initialize
> cluster with large (more than 2^32) values.
>
> There are a lot of open items, but I would like to notice some of them:
>  * WAL becomes significantly larger due to storage 8 byte xids instead of 4
> byte xids.  Probably, its needed to use base approach in WAL too.
>

Yeah and I think it can impact performance as well.  By any chance
have you run pgbench read-write to see the performance impact of this
patch?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: PostgreSLQ v10.1 and xlC compiler on AIX

2017-11-23 Thread Michael Paquier
On Thu, Nov 23, 2017 at 7:57 PM, REIX, Tony  wrote:
> We are porting PostgreSQL v10.1 on AIX (7.2 for now).
> And we have several tests failures, in 32bit and 64bit.
> We are using xlc 13.01.0003.0003 with -O2.
> Tests were 100% OK with version 9.6.2 .
>
> About 32 bit failures within v10.1, we have 3 failures including:
>  create_aggregate ... FAILED
>  aggregates   ... FAILED
>
> I've found that these 2 failures disappear when building :
>./src/backend/parser/gram.c
> without -O2 !
> However, this is a 44,498 lines file and it may take a while for finding the
> root issue.

When running regression tests and those fail, there is a file called
regressions.diff which gets generated in src/test/regress showing the
difference between the results generated and the results expected when
running the SQL queries which are part of the regression tests.
Attaching this file to this thread would help in determining what's
wrong with the regression tests.

> I invite anyone involved in porting/using PostgreSQL 10.1 on AIX to take
> part of a discussion about how to make v10.1 work perfectly on AIX.

Note that xlc 12.1 is tested with AIX machines on the buildfarms, but
there is no coverage for 13.1 visibly. It would be nice if you could
set up a buildfarm machine to catch problems way earlier.
-- 
Michael



Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

2017-11-23 Thread amul sul
On Sat, Nov 11, 2017 at 1:05 AM, Robert Haas  wrote:
> On Wed, Sep 27, 2017 at 7:07 AM, amul sul  wrote:
>> Attaching POC patch that throws an error in the case of a concurrent update
>> to an already deleted tuple due to UPDATE of partition key[1].
>>
>> In a normal update new tuple is linked to the old one via ctid forming
>> a chain of tuple versions but UPDATE of partition key[1] move tuple
>> from one partition to an another partition table which breaks that chain.
>
> This patch needs a rebase.  It has one whitespace-only hunk that
> should possibly be excluded.
>
Thanks for looking at the patch.

> I think the basic idea of this is sound.  Either you or Amit need to
> document the behavior in the user-facing documentation, and it needs
> tests that hit every single one of the new error checks you've added
> (obviously, the tests will only work in combination with Amit's
> patch).  The isolation should be sufficient to write such tests.
>
> It needs some more extensive comments as well.  The fact that we're
> assigning a meaning to ip_blkid -> InvalidBlockNumber is a big deal,
> and should at least be documented in itemptr.h in the comments for the
> ItemPointerData structure.
>
> I suspect ExecOnConflictUpdate needs an update for the
> HeapTupleUpdated case similar to what you've done elsewhere.
>

UPDATE of partition key v25[1] has documented this concurrent scenario,
I need to rework on that document paragraph to include this behaviour, will
discuss with Amit.

Attached 0001 patch includes error check for 8 functions, out of 8 I am able
to build isolation test for 4 functions -- ExecUpdate,ExecDelete,
GetTupleForTrigger & ExecLockRows.

And remaining are EvalPlanQualFetch, ExecOnConflictUpdate,
RelationFindReplTupleByIndex & RelationFindReplTupleSeq.  Note that check in
RelationFindReplTupleByIndex & RelationFindReplTupleSeq will have LOG not an
ERROR.

Any help/suggestion to build test for these function would be much appreciated.


1] 
http://postgr.es/m/CAJ3gD9f4Um99sOJmVSEPj783VWw5GbXkgU3OWcYZJv=ipjt...@mail.gmail.com


Regards,
Amul


0001-POC-Invalidate-ip_blkid-v2.patch
Description: Binary data


0002-isolation-tests-v1.patch
Description: Binary data


PostgreSLQ v10.1 and xlC compiler on AIX

2017-11-23 Thread REIX, Tony
Hi,

We are porting PostgreSQL v10.1 on AIX (7.2 for now).
And we have several tests failures, in 32bit and 64bit.
We are using xlc 13.01.0003.0003 with -O2.
Tests were 100% OK with version 9.6.2 .

About 32 bit failures within v10.1, we have 3 failures including:
 create_aggregate ... FAILED
 aggregates   ... FAILED

I've found that these 2 failures disappear when building :
   ./src/backend/parser/gram.c
without -O2 !
However, this is a 44,498 lines file and it may take a while for finding the 
root issue.

I invite anyone involved in porting/using PostgreSQL 10.1 on AIX to take part 
of a discussion about how to make v10.1 work perfectly on AIX.

Regards,

Cordialement,

Tony Reix

Bull - ATOS
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net


Re: With commit 4e5fe9ad19, range partition missing handling for the NULL partition key

2017-11-23 Thread Rushabh Lathia
On Wed, Nov 22, 2017 at 2:36 PM, Amit Langote  wrote:

> On 2017/11/22 17:42, amul sul wrote:
> > On Wed, Nov 22, 2017 at 11:38 AM, Amit Langote wrote:
> >> On 2017/11/22 13:45, Rushabh Lathia wrote:
> >>> Attaching patch to fix as well as regression test.
> >>
> >> Thanks for the patch.  About the code, how about do it like the attached
> >> instead?
> >>
> >
> > Looks good to me, even we can skip the following change in v2 patch:
> >
> > 19 @@ -2560,6 +2559,8 @@ get_partition_for_tuple(Relation relation,
> > Datum *values, bool *isnull)
> >  20  */
> >  21 part_index =
> > partdesc->boundinfo->indexes[bound_offset + 1];
> >  22 }
> >  23 +   else
> >  24 +   part_index = partdesc->boundinfo->default_index;
> >  25 }
> >  26 break;
> >  27
> >
> > default_index will get assign by following code in
> get_partition_for_tuple() :
> >
> >/*
> >  * part_index < 0 means we failed to find a partition of this parent.
> >  * Use the default partition, if there is one.
> >  */
> > if (part_index < 0)
> > part_index = partdesc->boundinfo->default_index;
>
> Good point.  Updated patch attached.
>

Thanks Amit.

Patch looks good to me.


-- 
Rushabh Lathia
www.EnterpriseDB.com


Re: [HACKERS] [POC] Faster processing at Gather node

2017-11-23 Thread Rafia Sabih
On Thu, Nov 16, 2017 at 12:24 AM, Andres Freund  wrote:
> Hi,
>
> On 2017-11-15 13:48:18 -0500, Robert Haas wrote:
>> I think that we need a little bit deeper analysis here to draw any
>> firm conclusions.
>
> Indeed.
>
>
>> I suspect that one factor is that many of the queries actually send
>> very few rows through the Gather.
>
> Yep. I kinda wonder if the same result would present if the benchmarks
> were run with parallel_leader_participation. The theory being what were
> seing is just that the leader doesn't accept any tuples, and the large
> queue size just helps because workers can run for longer.
>
I ran Q12 with parallel_leader_participation = off and could not get
any performance improvement with the patches given by Robert.The
result was same for head as well. The query plan also remain
unaffected with the value of this parameter.

Here are the details of the experiment,
TPC-H scale factor = 20,
work_mem = 1GB
random_page_cost = seq_page_cost = 0.1
max_parallel_workers_per_gather = 4

PG commit: 745948422c799c1b9f976ee30f21a7aac050e0f3

Please find the attached file for the explain analyse output for
either values of parallel_leader_participation and patches.
-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
with parallel_leader_participation = 1;

 QUERY PLAN 
 
 
--
-
 Limit  (cost=1001.19..504469.79 rows=1 width=27) (actual 
time=21833.206..21833.207 rows=1 loops=1)
   ->  GroupAggregate  (cost=1001.19..3525281.42 rows=7 width=27) (actual 
time=21833.203..21833.203 rows=1 loops=1)
 Group Key: lineitem.l_shipmode
 ->  Gather Merge  (cost=1001.19..3515185.81 rows=576888 width=27) 
(actual time=4.388..21590.757 rows=311095 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop  (cost=1.13..3445472.82 rows=144222 width=27) 
(actual time=0.150..4399.384 rows=62247 loops=5)
 ->  Parallel Index Scan using l_shipmode on lineitem  
(cost=0.57..3337659.69 rows=144222 width=19) (actual time=0.111..3772.865 
rows=62247 loops=5)
   Index Cond: (l_shipmode = ANY ('{"REG 
AIR",RAIL}'::bpchar[]))
   Filter: ((l_commitdate < l_receiptdate) AND 
(l_shipdate < l_commitdate) AND (l_receiptdate >= '1995-01-01'::date) AND 
(l_receiptdate < '1996-01-01 00:00:00'::timestamp without
 time zone))
   Rows Removed by Filter: 3367603
 ->  Index Scan using orders_pkey on orders  
(cost=0.56..0.75 rows=1 width=20) (actual time=0.009..0.009 rows=1 loops=311236)
   Index Cond: (o_orderkey = lineitem.l_orderkey)
 Planning time: 0.526 ms
 Execution time: 21835.922 ms
(15 rows)

postgres=# set parallel_leader_participation =0;
SET
postgres=# \i /data/rafia.sabih/dss/queries/12.sql 

  QUERY PLAN
  
 
--
-
 Limit  (cost=1001.19..504469.79 rows=1 width=27) (actual 
time=21179.065..21179.066 rows=1 loops=1)
   ->  GroupAggregate  (cost=1001.19..3525281.42 rows=7 width=27) (actual 
time=21179.064..21179.064 rows=1 loops=1)
 Group Key: lineitem.l_shipmode
 ->  Gather Merge  (cost=1001.19..3515185.81 rows=576888 width=27) 
(actual time=4.201..20941.385 rows=311095 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Nested Loop  (cost=1.13..3445472.82 rows=144222 width=27) 
(actual time=0.187..5105.780 rows=77797 loops=4)
 ->  Parallel Index Scan using l_shipmode on lineitem  
(cost=0.57..3337659.69 rows=144222 width=19) (actual time=0.149..4362.235 
rows=77797 loops=4)
   Index Cond: (l_shipmode = ANY ('{"REG 
AIR",RAIL}'::bpchar[]))
   Filter: ((l_commitdate < l_receiptdate) AND 
(l_shipdate < l_commitdate) AND (l_receiptdate >= '1995-01-01'::date) AND 
(l_receiptdate < '1996-01-01 00:00:00'::timestamp without
 time zone))
   Rows Removed by Filter: 4208802
 ->  Index Scan using orders_pkey on orders  
(cost=0.56..0.75 rows=1 width=20) (actual time=0.008..0.008 rows=1 loops=311187)
   Index