Re: [HACKERS] relkind check in DefineIndex

2017-10-15 Thread Amit Langote
On 2017/10/14 4:32, Robert Haas wrote:
> On Fri, Oct 13, 2017 at 12:38 PM, Alvaro Herrera
>  wrote:
>> The relkind check in DefineIndex has grown into an ugly rats nest of
>> 'if' statements.  I propose to change it into a switch, as per the
>> attached.
> 
> wfm

+1

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How does postgres store the join predicate for a relation in a given query

2017-10-15 Thread Ashutosh Bapat
On Sat, Oct 14, 2017 at 3:15 AM, Gourav Kumar  wrote:
> Why does have_relevant_joinclause() and have_relevant_eclass_joinclause()
> return true for all possible joins for the query given below.
> Even when they have no join predicate between them.
> e.g. join between ss1 & ws3, ss2 & ws3 etc.
>

The prologues of those functions and comments within those explain that.

/*
 * have_relevant_joinclause
 *  Detect whether there is a joinclause that involves
 *  the two given relations.
 *
 * Note: the joinclause does not have to be evaluable with only these two
 * relations.  This is intentional.  For example consider
 *  SELECT * FROM a, b, c WHERE a.x = (b.y + c.z)
 * If a is much larger than the other tables, it may be worthwhile to
 * cross-join b and c and then use an inner indexscan on a.x.  Therefore
 * we should consider this joinclause as reason to join b to c, even though
 * it can't be applied at that join step.
 */

/*
 * have_relevant_eclass_joinclause
 *  Detect whether there is an EquivalenceClass that could produce
 *  a joinclause involving the two given relations.
 *
 * This is essentially a very cut-down version of
 * generate_join_implied_equalities().  Note it's OK to occasionally say "yes"
 * incorrectly.  Hence we don't bother with details like whether the lack of a
 * cross-type operator might prevent the clause from actually being generated.
 */
May be you want to see whether those comments are applicable in your
case and also see how the callers handle the return values.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-10-15 Thread Amit Langote
Hi Amit.

On 2017/10/04 22:51, Amit Khandekar wrote:
> Main patch :
> update-partition-key_v20.patch

Guess you're already working on it but the patch needs a rebase.  A couple
of hunks in the patch to execMain.c and nodeModifyTable.c fail.

Meanwhile a few comments:

+void
+pull_child_partition_columns(Bitmapset **bitmapset,
+Relation rel,
+Relation parent)

Nitpick: don't we normally list the output argument(s) at the end?  Also,
"bitmapset" could be renamed to something that conveys what it contains?

+   if (partattno != 0)
+   child_keycols =
+   bms_add_member(child_keycols,
+  partattno -
FirstLowInvalidHeapAttributeNumber);
+   }
+   foreach(lc, partexprs)
+   {

Elsewhere (in quite a few places), we don't iterate over partexprs
separately like this, although I'm not saying it is bad, just different
from other places.

+ * the transition tuplestores can be built. Furthermore, if the transition
+ *  capture is happening for UPDATEd rows being moved to another
partition due
+ *  partition-key change, then this function is called once when the row is
+ *  deleted (to capture OLD row), and once when the row is inserted to
another
+ *  partition (to capture NEW row). This is done separately because
DELETE and
+ *  INSERT happen on different tables.

Extra space at the beginning from the 2nd line onwards.

+   (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup
== NULL

Is there some reason why a bitwise operator is used here?

+ * 'update_rri' has the UPDATE per-subplan result rels.

Could you explain why they are being received as input here?

+ * 'perleaf_parentchild_maps' receives an array of TupleConversionMap objects
+ * with on entry for every leaf partition (required to convert input
tuple
+ * based on the root table's rowtype to a leaf partition's rowtype after
+ * tuple routing is done)

Could this be named leaf_tupconv_maps, maybe?  It perhaps makes clear that
they are maps needed for "tuple conversion".  And the other field holding
the reverse map as leaf_rev_tupconv_maps.  Either that or use underscores
to separate words, but then it gets too long I guess.


+   tuple = ConvertPartitionTupleSlot(mtstate,
+
mtstate->mt_perleaf_parentchild_maps[leaf_part_index],
+

The 2nd line here seems to have gone over 80 characters.

ISTM, ConvertPartitionTupleSlot has a very specific job and a bit complex
interface.  I guess it could simply have the following interface:

static HeapTuple ConvertPartitionTuple(ModifyTabelState *mtstate,
   HeapTuple tuple, bool is_update);

And figure out, based on the value of is_update, which map to use and
which slot to set *p_new_slot to (what is now "new_slot" argument).
You're getting mtstate here anyway, which contains all the information you
need here.  It seems better to make that (selecting which map and which
slot) part of the function's implementation if we're having this function
at all, imho.  Maybe I'm missing some details there, but my point still
remains that we should try to put more logic in that function instead of
it just do the mechanical tuple conversion.

+ * We have already checked partition constraints above, so skip them
+ * below.

How about: ", so skip checking here."?


ISTM, the newly introduced logic in ExecSetupTransitionCaptureState() to
try to reuse the per-subplan child-to-parent map as per-leaf
child-to-parent map could be simplified a bit.  I mean the following code:

+/*
+ * But for Updates, we can share the per-subplan maps with the per-leaf
+ * maps.
+ */
+update_rri_index = 0;
+update_rri = mtstate->resultRelInfo;
+if (mtstate->mt_nplans > 0)
+cur_reloid = RelationGetRelid(update_rri[0].ri_RelationDesc);

-/* Choose the right set of partitions */
-if (mtstate->mt_partition_dispatch_info != NULL)
+for (i = 0; i < numResultRelInfos; ++i)
+{


How about (pseudo-code):

 j = 0;
 for (i = 0; i < n_leaf_parts; i++)
 {
 if (j < n_subplans && leaf_rri[i]->oid == subplan_rri[j]->oid)
 {
 leaf_childparent_map[i] = subplan_childparent_map[j];
 j++;
 }
 else
 {
 leaf_childparent_map[i] = new map
 }
 }

I think the above would also be useful in ExecSetupPartitionTupleRouting()
where you've added similar code to try to reuse per-subplan ResultRelInfos.


In ExecInitModifyTable(), can we try to minimize the number of places
where update_tuple_routing_needed is being set.  Currently, it's being set
in 3 places:

+boolupdate_tuple_routing_needed = node->part_cols_updated;

&

+/*
+ * If this is an UPDATE and a BEFORE UPDATE trigger is present,
we may
+ * need to do update tuple routing.
+ */
+if (resultRelInfo->ri_TrigDesc &&
+resultRelInfo->ri_TrigDesc->trig_update_before_row &&
+   

Re: [HACKERS] Determine state of cluster (HA)

2017-10-15 Thread Craig Ringer
On 13 October 2017 at 08:50, Joshua D. Drake  wrote:
> -Hackers,
>
> I had a long call with a firm developing front end proxy/cache/HA for
> Postgres today. Essentially the software is a replacement for PGPool in
> entirety but also supports analytics etc... When I was asking them about
> pain points they talked about the below and I was wondering if this is a
> problem we would like to solve.

IMO: no one node knows the full state of the system, or can know it.

I'd love PostgreSQL to help users more with scaling, HA, etc. But I
think it's a big job. We'd need:

- a node topology of some kind, communicated between nodes
- heartbeat and monitoring
- failover coordination
- pooling/proxying
- STONITH/fencing
- etc.

That said, I do think it'd be very desirable for us to introduce a
greater link from a standby to master:

- Get info about master. We should finish merging recovery.conf into
postgresql.conf.

-

> b. Attempt to connect to the host directly, if not...
> c. use the slave and use the hostname via dblink to connect to the master,
> as the hostname , i.e. select * from dblink('" + connInfo + "
> dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr inet).
> This is necessary in the event the hostname used in the recovery.conf file
> is not resolvable from the outside.

OK, so "connect directly" here means from some 3rd party, the one
doing the querying of the replica.

> 1.  The dblink call doesn't have a way to specify a timeout, so we have to
> use Java futures to control how long this may take to a reasonable amount of
> time;

statement_timeout doesn't work?

If not, that sounds like a sensible, separate feature to add. Patches welcome!

> 2.  NAT mapping may result in us detecting IP ranges that are not accessible
> to the application nodes.

PostgreSQL can't do anything about this one.

> 3.  there is no easy way to monitor for state changes as they happen,
> allowing faster failovers, everything has to be polled based on events;

It'd be pretty simple to write a function that sleeps in the backend
until it's promoted. I don't know off the top of my head if we set all
proc latches when we promote, but if we don't it's probably harmless
and somewhat useful to do so.

Either way, you'd do long-polling. Call the function and let the
connection block until something interesting happens. Use TCP
keepalives to make sure you notice if it dies. Have the function
return when the state changes.

> 4.  It doesn't support cascading replication very well, although we could
> augment the logic to allow us to map the relationship between nodes.
> 5.  There is no way to connect to a db node with something akin to
> SQL-Server's "application intent" flags, to allow a connection to be
> rejected if we wish it to be a read/write connection.  This helps detect the
> state of the node directly without having to ask any further questions of
> the node, and makes it easier to "stall" during connection until a proper
> connection can be made.

That sounds desirable, and a good step toward eventually being able to
transparently re-route read/write queries from replica to master.
Which is where I'd like to land up eventually.

Again, that'd be a sensible patch to submit, quite separately to the
other topics.

> 6.  The master, on shutdown, will not actually close and disable connections
> as it shuts down, instead, it will issue an error that it is shutting down
> as it does so.

Er, yes? I don't understand what you are getting at here.

Can you describe expected vs actual behaviour in more detail?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Lockable views

2017-10-15 Thread Tatsuo Ishii
>> >> test=# CREATE VIEW v3 AS SELECT count(*) FROM v1;
>> >> CREATE VIEW
>> >> test=# BEGIN;
>> >> BEGIN
>> >> test=# LOCK TABLE v3;
>> >> ERROR:  cannot lock view "v3"
>> >> DETAIL:  Views that return aggregate functions are not automatically 
>> >> updatable.
>> > 
>> > It would be nice if the message would be something like:
>> > 
>> > DETAIL:  Views that return aggregate functions are not lockable
> 
> This uses messages from view_query_is_auto_updatable() of the rewrite system 
> directly. 
> Although we can modify the messages, I think it is not necessary for now
> since we can lock only automatically updatable views.

You could add a flag to view_query_is_auto_updatable() to switch the
message between

 DETAIL:  Views that return aggregate functions are not automatically updatable.

and

 DETAIL:  Views that return aggregate functions are not lockable

>> > I wonder if we should lock tables in a subquery as well. For example,
>> > 
>> > create view v1 as select * from t1 where i in (select i from t2);
>> > 
>> > In this case should we lock t2 as well?
>> 
>> Current the patch ignores t2 in the case above.
>> 
>> So we have options below:
>> 
>> - Leave as it is (ignore tables appearing in a subquery)
>> 
>> - Lock all tables including in a subquery
>> 
>> - Check subquery in the view definition. If there are some tables
>>   involved, emit an error and abort.
>> 
>> The first one might be different from what users expect. There may be
>> a risk that the second one could cause deadlock. So it seems the third
>> one seems to be the safest IMO.
> 
> Make sense. Even if the view is locked, when tables in a subquery is
> modified, the contents of view can change. To avoid it, we have to
> lock tables, or give up to lock such views. 
> 
> We can say the same thing for functions in a subquery. If the definition
> of the functions are changed, the result of the view can change.
> We cannot lock functions, but should we abtain row-level lock on pg_proc
> in such cases? (of cause, or give up to lock such views)

I think we don't need to care about function definition changes used
in where clauses in views. None view queries using functions do not
care about the definition changes of functions while executing the
query. So why updatable views need to care them?

> BTW, though you mentioned the risk of deadlocks, even when there
> are no subquery, deadlock can occur in the current patch.
> 
> For example, lock a table T in Session1, and then lock a view V
> whose base relation is T in Session2. Session2 will wait for 
> Session1 to release the lock on T. After this, when Session1 try to
> lock view V, the deadlock occurs and the query is canceled.

You are right. Dealocks could occur in any case.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] v10 bottom-listed

2017-10-15 Thread Amit Langote
On 2017/10/13 22:58, Magnus Hagander wrote:
> On Fri, Oct 6, 2017 at 3:59 AM, Amit Langote 
> wrote:
> 
>> On 2017/10/05 22:28, Erik Rijkers wrote:
>>> In the 'ftp' listing, v10 appears at the bottom:
>>>   https://www.postgresql.org/ftp/source/
>>>
>>> With all the other v10* directories at the top, we could get a lot of
>>> people installing wrong binaries...
>>>
>>> Maybe it can be fixed so that it appears at the top.
>>
>> Still see it at the bottom.  Maybe ping pgsql-www?
>>
> 
> Turns out there was already quite an ugly hack to deal with this, so I just
> made it a bit more ugly. Once the caches expire I believe sorting should be
> correct.

Saw that it's now listed all the way up, thanks! :)

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Justin Pryzby
On Sat, Oct 14, 2017 at 08:56:56PM -0500, Justin Pryzby wrote:
> On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> > > Also notice the vacuum process was interrupted, same as yesterday (think
> > > goodness for full logs).  Our INSERT script is using python
> > > multiprocessing.pool() with "maxtasksperchild=1", which I think means we 
> > > load
> > > one file and then exit the subprocess, and pool() creates a new subproc, 
> > > which
> > > starts a new PG session and transaction.  Which explains why autovacuum 
> > > starts
> > > processing the table only to be immediately interrupted.
> 
> On Sun, Oct 15, 2017 at 01:57:14AM +0200, Tomas Vondra wrote:
> > I don't follow. Why does it explain that autovacuum gets canceled? I
> > mean, merely opening a new connection/session should not cancel
> > autovacuum. That requires a command that requires table-level lock
> > conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).
> 
> I was thinking that INSERT would do it, but I gather you're right about
> autovacuum.  Let me get back to you about this..

I confirmed that we're taking an explicit lock before creating new child tables
(as I recall, to avoid errors in the logs shortly after midnight when multiple
subprocesses see data for the new date for the first time):

 2017-10-15 12:52:50.499-04 | 59e3925e.6951 | statement: LOCK TABLE 
cdrs_huawei_sgsnPDPRecord IN SHARE UPDATE EXCLUSIVE MODE

Probably we can improve that with LOCK TABLE ONLY.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-10-15 Thread Pavel Stehule
2017-08-21 6:25 GMT+02:00 Pavel Stehule :

>
>
>> xpath-bugfix.patch affected only xml values containing an xml declaration
>> with
>> "encoding" attribute.  In UTF8 databases, this latest proposal
>> (xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch.  In
>> non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values
>> containing non-ASCII data.  In a LATIN1 database, the following works
>> today
>> but breaks under your latest proposal:
>>
>>   SELECT xpath('text()', ('' || convert_from('\xc2b0', 'LATIN1') ||
>> '')::xml);
>>
>> It's acceptable to break that, since the documentation explicitly
>> disclaims
>> support for it.  xpath-bugfix.patch breaks different use cases, which are
>> likewise acceptable to break.  See my 2017-08-08 review for details.
>>
>
> The fact so this code is working shows so a universe is pretty dangerous
> place :)
>
>
ping?

will we continue in this topic?



>


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-10-15 Thread Vik Fearing
On 10/14/2017 11:47 PM, Peter Geoghegan wrote:
> On Sat, Oct 14, 2017 at 10:58 AM, Robert Haas  wrote:
>> I think it's perfectly sensible to view those 2 bits as making up a
>> 2-bit field with 4 states rather than displaying each bit
>> individually, but you obviously disagree.  Fair enough.>
> I guess it is that simple.

FWIW, my opinion falls in line with Robert's.

Also, whichever way it goes, this is a patch I've been wanting for a
long time.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Tomas Vondra
Hi,

On 10/15/2017 03:56 AM, Justin Pryzby wrote:
> On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
...
>> It's a bit difficult to guess what went wrong from this backtrace. For
>> me gdb typically prints a bunch of lines immediately before the frames,
>> explaining what went wrong - not sure why it's missing here.
> 
> Do you mean this ?
> 
> ...
> Loaded symbols for /lib64/libnss_files-2.12.so
> Core was generated by `postgres: autovacuum worker process   gtt 
> '.
> Program terminated with signal 11, Segmentation fault.
> #0  pfree (pointer=0x298c740) at mcxt.c:954
> 954 (*context->methods->free_p) (context, pointer);
> 

Yes. So either 'context' is bogus. Or 'context->methods' is bogus. Or
'context->methods->free_p' is bogus.

>> Perhaps some of those pointers are bogus, the memory was already pfree-d
>> or something like that. You'll have to poke around and try dereferencing
>> the pointers to find what works and what does not.
>>
>> For example what do these gdb commands do in the #0 frame?
>>
>> (gdb) p *(MemoryContext)context
> 
> (gdb) p *(MemoryContext)context
> Cannot access memory at address 0x7474617261763a20
> 

OK, this means the memory context pointer (tracked in the header of a
chunk) is bogus. There are multiple common ways how that could happen:

* Something corrupts memory (typically out-of-bounds write).

* The pointer got allocated in an incorrect memory context (which then
was released, and the memory was reused for new stuff).

* It's a use-after-free.

* ... various other possibilities ...

> 
> I uploaded the corefile:
> http://telsasoft.com/tmp/coredump-postgres-autovacuum-brin-summarize.gz
> 

Thanks, but I'm not sure that'll help, at this point. We already know
what happened (corrupted memory), we don't know "how". And core files
are mostly just "snapshots" so are not very useful in answering that :-(

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-15 Thread Thomas Munro
On Fri, Oct 13, 2017 at 10:01 AM, Tom Lane  wrote:
> But I see very
> little case for allowing CTEs to capture such references, because surely
> we are never going to allow that to do anything useful, and we have
> several years of precedent now that they don't capture.

For what it's worth, SQL Server allows DML in CTEs like us but went
the other way on this.  Not only are its CTEs in scope as DML targets,
it actually lets you update them in cases where a view would be
updatable, rewriting as base table updates.  I'm not suggesting that
we should do that too (unless of course it shows up in a future
standard), just pointing it out as a curiosity.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On markers of changed data

2017-10-15 Thread Andrey Borodin
Hello!


> 9 окт. 2017 г., в 10:23, Andrey Borodin  написал(а):
> 
> Thanks, Stephen, this actually pointed what to look for
> VM is WAL-logged [0]
> FSM is not [1]
> 
> [0] 
> https://github.com/postgres/postgres/blob/113b0045e20d40f726a0a30e33214455e4f1385e/src/backend/access/heap/visibilitymap.c#L315
> [1] 
> https://github.com/postgres/postgres/blob/1d25779284fe1ba08ecd57e647292a9deb241376/src/backend/storage/freespace/freespace.c#L593

After tests of binary equivalence before and after backup I've come to 
conclusion, that Visibility Map cannot be backuped incrementally: it's bits are 
unset without page LSN bump. This can lead to wrong results of Index Only Scans 
executed on freshly restored backups.

In my implementation of incremental backup in WAL-G I will disable any FSM, VM 
and XACT\CLOG incrementation.

Posting this for the record, so that if someone goes this way info will be 
available. Thank you for your attention.

Best regards, Andrey Borodin.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SAP Application deployment on PostgreSQL

2017-10-15 Thread legrand legrand
There is still some activities regarding this question like

https://blogs.sap.com/2017/05/08/replicate-abap-database-table-definition-to-postgresql/

Maybe PG or EDB (with the Oracle compatibility layer) would have been better
strategic choice for SAP to conter ORACLE ;o)



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-10-15 Thread Pavel Stehule
2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Hi, thanks for the new patch.
>
> # The patch is missing xpath_parser.h. That of the first patch was usable.
>
> At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule 
> wrote in