Re: [HACKERS] Proposal: global index

2017-08-20 Thread Simon Riggs
On 18 August 2017 at 15:40, Alvaro Herrera  wrote:
> Ildar Musin wrote:
>
>> While we've been developing pg_pathman extension one of the most frequent
>> questions we got from our users was about global index support. We cannot
>> provide it within an extension. And I couldn't find any recent discussion
>> about someone implementing it. So I'm thinking about giving it a shot and
>> start working on a patch for postgres.
>>
>> One possible solution is to create an extended version of item pointer which
>> would store relation oid along with block number and position:
>
> I've been playing with the index code in order to allow indirect tuples,
> which are stored in a format different from IndexTupleData.
>
> I've been adding an "InMemoryIndexTuple" (maybe there's a better name)
> which internally has pointers to both IndexTupleData and
> IndirectIndexTupleData, which makes it easier to pass around the index
> tuple in either format.

> It's very easy to add an OID to that struct,
> which then allows to include the OID in either an indirect index tuple
> or a regular one.

If there is a unique index then there is no need for that. Additional
data to the index makes it even bigger and even less useful, so we
need to count that as a further disadvantage of global indexes.

I have a very clear statement from a customer recently that "We will
never use global indexes", based upon their absolute uselessness in
Oracle.

> Then, wherever we're using IndexTupleData in the index AM code, we would
> replace it with InMemoryIndexTuple.  This should satisfy both your use
> case and mine.

Global indexes are a subset of indirect indexes use case but luckily
not the only use.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] coverage analysis improvements

2017-08-20 Thread Michael Paquier
On Fri, Aug 11, 2017 at 1:05 PM, Peter Eisentraut
 wrote:
> Here is a patch series with some significant reworking and adjusting of
> how the coverage analysis tools are run.  The result should be that the
> "make coverage" runs are faster and more robust, the results are more
> accurate, and the code is simpler.  Please see the individual patches
> for details.
>
> I have replaced "make coverage-html" with "make coverage" (but kept the
> previous name for compatibility), since the old meaning of the latter
> has gone away and was never very useful.  Other than that, the usage of
> everything stays the same.
>
> I will add it to the commit fest.  Testing with different versions of
> tools would be especially valuable.

Patch 0001 fails to apply as of c629324. Which versions of lcov and
gcov did you use for your tests?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assorted code cleanup

2017-08-20 Thread Michael Paquier
On Fri, Aug 18, 2017 at 1:56 AM, Peter Eisentraut
 wrote:
> Here are a few assorted patches I made while working on the stdbool set,
> cleaning up various pieces of dead code and weird styles.
>
> - Drop excessive dereferencing of function pointers

-   (*next_ProcessUtility_hook) (pstmt, queryString,
+   next_ProcessUtility_hook(pstmt, queryString,
 context, params, queryEnv,
 dest, completionTag);
But this... Personally I like the current grammar which allows one to
make the difference between a function call with something declared
locally and something that may be going to a custom code path. So I
think that you had better not update the system hooks that external
modules can use via shared_preload_libraries.

> - Remove endof macro

Its last use is 1aa58d3a from 2009.

> - Remove unnecessary casts

Those are also quite old things:
src/include/access/attnum.h:((bool) ((attributeNumber) !=
InvalidAttrNumber))
src/include/access/attnum.h:((bool) ((attributeNumber) > 0))
src/backend/utils/hash/dynahash.c:  *foundPtr = (bool) (currBucket != NULL);
[... etc ...]

> - Remove unnecessary parentheses in return statements

So you would still keep parenthesis like here for simple expressions:
contrib/bloom/blutils.c:return (x - 1);
No objections.

Here are some more:
contrib/intarray/_int_bool.c:   return (calcnot) ?
contrib/ltree/ltxtquery_op.c:   return (calcnot) ?

And there are many "(0)" "S_ANYTHING" in src/interfaces/ecpg/test/ and
src/interfaces/ecpg/preproc/.

src/port/ stuff is better left off, good you did not touch it.

> - Remove our own definition of NULL

Fine. c.h uses once NULL before enforcing its definition.

> - fuzzystrmatch: Remove dead code

Those are remnants of a323ede, which missed to removed everything.
Looks good to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-08-20 Thread Haribabu Kommi
On Tue, Aug 15, 2017 at 7:29 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/4/17 01:06, Haribabu Kommi wrote:
> > Both pg_dump and pg_upgrade tests are passed. Updated patch attached
> > I will add this patch to the next commitfest.
>
> This patch needs to be rebased for the upcoming commit fest.
>

Thanks for checking. Rebased patch is attached.

Regards,
Hari Babu
Fujitsu Australia


0001-pg_dump-and-pg_dumpall-database-handling-refactoring_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-20 Thread Pavel Stehule
>
> xpath-bugfix.patch affected only xml values containing an xml declaration
> with
> "encoding" attribute.  In UTF8 databases, this latest proposal
> (xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch.  In
> non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values
> containing non-ASCII data.  In a LATIN1 database, the following works today
> but breaks under your latest proposal:
>
>   SELECT xpath('text()', ('' || convert_from('\xc2b0', 'LATIN1') ||
> '')::xml);
>
> It's acceptable to break that, since the documentation explicitly disclaims
> support for it.  xpath-bugfix.patch breaks different use cases, which are
> likewise acceptable to break.  See my 2017-08-08 review for details.
>

The fact so this code is working shows so a universe is pretty dangerous
place :)


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-20 Thread Ashutosh Bapat
On Sat, Aug 19, 2017 at 1:21 AM, Robert Haas  wrote:
> On Fri, Aug 18, 2017 at 1:17 AM, Ashutosh Bapat
>  wrote:
>> 0004 patch in partition-wise join patchset has code to expand
>> partition hierarchy. That patch is expanding inheritance hierarchy in
>> depth first manner. Robert commented that instead of depth first
>> manner, it will be better if we expand it in partitioned tables first
>> manner. With the latest changes in your patch-set I don't see the
>> reason for expanding in partitioned tables first order. Can you please
>> elaborate if we still need to expand in partitioned table first
>> manner? May be we should just address the expansion issue in 0004
>> instead of dividing it in two patches.
>
> Let me see if I can clarify.  I think there are three requirements here:
>
> A. Amit wants to be able to prune leaf partitions before opening and
> locking those relations, so that pruning can be done earlier and,
> therefore, more cheaply.

We could actually prune partitioned tables thus pruning whole
partitioned tree. Do we want to then lock those partitioned tables but
not the leaves in that tree?

If there's already some discussion answering this question, please
point me to the same. Sorry for not paying attention to it.

>
> B. Partition-wise join wants to expand the inheritance hierarchy a
> level at a time instead of all at once, ending up with rte->inh = true
> entries for intermediate partitioned tables.

And create AppendRelInfos which pair children with their partitioned
parent rather than the root.

>
> C. Partition-wise join (and lots of other things; see numerous
> mentions of EIBO in
> http://rhaas.blogspot.com/2017/08/plans-for-partitioning-in-v11.html)
> want to expand in bound order.
>
> Obviously, bound-order and partitioned-tables-first are incompatible
> orderings, but there's no actual conflict because the first one has to
> do with the order of *expansion* and the second one has to do with the
> order of *locking*.

right. Thanks for making it clear.

> So in the end game I think
> expand_inherited_rtentry looks approximately like this:
>
> 1. Calling find_all_inheritors with a new only-lock-the-partitions
> flag.  This should result in locking all partitioned tables in the
> inheritance hierarchy in breadth-first, low-OID-first order.  (When
> the only-lock-the-partitions isn't specified, all partitioned tables
> should still be locked before any unpartitioned tables, so that the
> locking order in that case is consistent with what we do here.)
>

I am confused. When "only-lock-the-partitions" is true, do we expect
intermediate partitioned tables to be locked? Why then "only" in the
flag?

> 2. Iterate over the partitioned tables identified in step 1 in the
> order in which they were returned.  For each one:
> - Decide which children can be pruned.
> - Lock the unpruned, non-partitioned children in low-OID-first order.
>
> 3. Make another pass over the inheritance hierarchy, starting at the
> root.  Traverse the whole hierarchy in breadth-first in *bound* order.
> Add RTEs and AppendRelInfos as we go -- these will have rte->inh =
> true for partitioned tables and rte->inh = false for leaf partitions.

These two seem to be based on the assumption that we have to lock all
the partitioned tables even if they can be pruned.

For regular inheritance there is only a single parent, so traversing
the list returned by find_all_inheritors suffices. For partitioned
hierarchy, we need to know the parent of every child, which is not
part of the find_all_inheritors() output list. Even if it returns only
the partitioned children, they themselves may have a parent different
from the root partition. So, we have to discard the output of
find_all_inheritors() for partitioned hierarchy and traverse the
children as per their orders in oids array in PartitionDesc. May be
it's better to separate the guts of expand_inherited_rtentry(), which
create AppendRelInfos, RTEs and rowmarks for the children into a
separate routine. Use that routine in two different functions
expand_inherited_rtentry() and expand_partitioned_rtentry() for
regular inheritance and partitioned inheritance resp. The functions
will use two different traversal methods appropriate for traversing
the children in either case.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-20 Thread Craig Ringer
On 21 August 2017 at 10:57, Craig Ringer  wrote:

> Hi all
>
> I've noticed a possible bug / design limitation where shm_mq_wait_internal
> sleep in a latch wait forever, and the postmaster gets stuck waiting for
> the bgworker the wait is running in to exit.
>
> This happens when the shm_mq does not have an associated bgworker handle
> registered because the other end is not known at mq creation time or is a
> normal backend not a bgworker. So a BGW handle cannot be passed.
>
> shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is
> interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any
> way; it just merrily resets its latch and keeps looping.
>
> It will bail out correctly on SIGQUIT.
>
> If the proc waiting to attach was known at queue creation time and was a
> bgworker, we'd pass a bgworker handle and the mq would notice it failed to
> start and stop waiting. There's only a problem if no bgworker handle can be
> supplied.
>
> The underlying problem is that CHECK_FOR_INTERRUPTS() doesn't care about
> SIGTERM or have any way to test for it. And we don't have any global
> management of SIGTERM like we do SIGQUIT so the shm_mq_wait_internal loop
> can't test for it.
>
> The only ways I can see to fix this are:
>
> * Generalize SIGTERM handling across postgres, so there's a global
> "got_SIGTERM" global that shm_mq_wait_internal can test to break out of its
> loop, and every backend's signal handler must set it. Lots of churn.
>
> * In a proc's signal handler, use globals set before entry and after exit
> from shm_mq operations to detect if we're currently in shm_mq and promote
> SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so
> CHECK_FOR_INTERRUPTS() will notice when the handler returns.
>
> * Allow passing of a *bool that tests for SIGTERM, or a function pointer
> called on each iteration to test whether looping should continue, to be
> passed to shm_mq_attach. So if you can't supply a bgw handle, you supply
> that instead. Provide a shm_mq_set_handle equivalent for it too.
>
> Any objections to the last approach?
>

BTW, you can work around it in extension code for existing versions with
something like this in your bgworker:

volatile bool   in_shm_mq = false;

void
my_handle_sigterm(SIGNAL_ARGS)
{
...

if (in_shm_mq)
{
/*
 * shm_mq can get stuck in shm_mq_wait_internal on SIGTERM;
see
 *
https://www.postgresql.org/message-id/CAMsr+YHmm=01lsueyr6ydz8clgfnk_fgdgi+qxujf+jelpv...@mail.gmail.com
 *
 * To work around this we keep track of whether we're in
shmem_mq
 * and generate a fake interrupt for CHECK_FOR_INTERRUPTS()
to
 * process if so.
 *
 * The guard around in_shm_mq may not be necessary, but
without
 * it any SIGTERM will likely cause ereport(FATAL) with
 * "terminating connection due to administrator command"
 * which isn't ideal.
 */
InterruptPending = true;
ProcDiePending = true;
}


}

inline static shm_mq_handle *
myext_shm_mq_attach(shm_mq *mq, dsm_segment *seg, BackgroundWorkerHandle
*handle)
{
shm_mq_handle *ret;
in_shm_mq = true;
ret = shm_mq_attach(mq, seg, handle);
in_shm_mq = false;
return ret;
}

/* repeated for shm_mq_receive, shm_mq_send, shm_mq_sendv,
shm_mq_wait_for_attach */


You can instead use non-blocking sends instead, and sleep on your own
latch, doing the same work as shm_mq_wait_internal yourself.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Re: Stats for triggers on partitioned tables not shown in EXPLAIN ANALYZE

2017-08-20 Thread Etsuro Fujita

On 2017/08/21 9:18, Amit Langote wrote:

On 2017/08/19 2:09, Robert Haas wrote:

On Fri, Aug 18, 2017 at 1:15 AM, Noah Misch  wrote:

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


Committed and back-patched the proposed patch.


Thanks.


Thank you!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-08-20 Thread Etsuro Fujita

On 2017/08/19 2:12, Robert Haas wrote:

On Thu, Aug 17, 2017 at 4:27 AM, Etsuro Fujita
 wrote:

I think that would be much more efficient than INSERTing tuples into the
remote server one by one.  What do you think about that?


I agree, but I wonder if we ought to make it work first using the
existing APIs and then add these new APIs as an optimization.


I'm not sure that's a good idea because that once we support INSERT 
tuple-routing for foreign partitions, we would have a workaround: INSERT 
INTO partitioned_table SELECT * from data_table where data_table is a 
foreign table defined for an input file using file_fdw.



postgres_fdw isn't the only FDW in the world, and it's better if
getting a working implementation doesn't require too many interfaces.


Agreed.  My vote would be to leave the COPY part as-is until we have 
these new APIs.


Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-08-20 Thread Etsuro Fujita

On 2017/08/18 22:41, David Fetter wrote:

On Fri, Aug 18, 2017 at 05:10:29PM +0900, Etsuro Fujita wrote:

On 2017/08/17 23:48, David Fetter wrote:

On Thu, Aug 17, 2017 at 05:27:05PM +0900, Etsuro Fujita wrote:

On 2017/07/11 6:56, Robert Haas wrote:

On Thu, Jun 29, 2017 at 6:20 AM, Etsuro Fujita
 wrote:

So, I dropped the COPY part.


Ouch.  I think we should try to figure out how the COPY part will be
handled before we commit to a design.


I spent some time on this.  To handle that, I'd like to propose doing
something similar to \copy (frontend copy): submit a COPY query "COPY ...

>FROM STDIN" to the remote server and route data from a file to the remote

server.  For that, I'd like to add new FDW APIs called during CopyFrom that
allow us to copy to foreign tables:

* BeginForeignCopyIn: this would be called after creating a ResultRelInfo
for the target table (or each leaf partition of the target partitioned
table) if it's a foreign table, and perform any initialization needed before
the remote copy can start.  In the postgres_fdw case, I think this function
would be a good place to send "COPY ... FROM STDIN" to the remote server.

* ExecForeignCopyInOneRow: this would be called instead of heap_insert if
the target is a foreign table, and route the tuple read from the file by
NextCopyFrom to the remote server.  In the postgres_fdw case, I think this
function would convert the tuple to text format for portability, and then
send the data to the remote server using PQputCopyData.

* EndForeignCopyIn: this would be called at the bottom of CopyFrom, and
release resources such as connections to the remote server.  In the
postgres_fdw case, this function would do PQputCopyEnd to terminate data
transfer.


These primitives look good.  I know it seems unlikely at first
blush, but do we know of bulk load APIs for non-PostgreSQL data
stores that this would be unable to serve?


Maybe I'm missing something, but I think these would allow the FDW
to do the remote copy the way it would like; in
ExecForeignCopyInOneRow, for example, the FDW could buffer tuples in
a buffer memory and transmit the buffered data to the remote server
if the data size exceeds a threshold.  The naming is not so good?
Suggestions are welcome.


The naming seems reasonable.

I was trying to figure out whether this fits well enough with the bulk
load APIs for databases other than PostgreSQL.  I'm guessing it's
"well enough" based on checking MySQL, Oracle, and MS SQL Server.


Good to know.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] shm_mq_wait_internal gets stuck forever on fast shutdown

2017-08-20 Thread Craig Ringer
Hi all

I've noticed a possible bug / design limitation where shm_mq_wait_internal
sleep in a latch wait forever, and the postmaster gets stuck waiting for
the bgworker the wait is running in to exit.

This happens when the shm_mq does not have an associated bgworker handle
registered because the other end is not known at mq creation time or is a
normal backend not a bgworker. So a BGW handle cannot be passed.

shm_mq_wait_internal() will CHECK_FOR_INTERRUPTS() when its latch wait is
interrupted by a SIGTERM. But it doesn't actually respond to SIGTERM in any
way; it just merrily resets its latch and keeps looping.

It will bail out correctly on SIGQUIT.

If the proc waiting to attach was known at queue creation time and was a
bgworker, we'd pass a bgworker handle and the mq would notice it failed to
start and stop waiting. There's only a problem if no bgworker handle can be
supplied.

The underlying problem is that CHECK_FOR_INTERRUPTS() doesn't care about
SIGTERM or have any way to test for it. And we don't have any global
management of SIGTERM like we do SIGQUIT so the shm_mq_wait_internal loop
can't test for it.

The only ways I can see to fix this are:

* Generalize SIGTERM handling across postgres, so there's a global
"got_SIGTERM" global that shm_mq_wait_internal can test to break out of its
loop, and every backend's signal handler must set it. Lots of churn.

* In a proc's signal handler, use globals set before entry and after exit
from shm_mq operations to detect if we're currently in shm_mq and promote
SIGTERM to SIGQUIT by sending a new signal to ourselves. Or set up state so
CHECK_FOR_INTERRUPTS() will notice when the handler returns.

* Allow passing of a *bool that tests for SIGTERM, or a function pointer
called on each iteration to test whether looping should continue, to be
passed to shm_mq_attach. So if you can't supply a bgw handle, you supply
that instead. Provide a shm_mq_set_handle equivalent for it too.

Any objections to the last approach?

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Craig Ringer
On 21 August 2017 at 10:36, Michael Paquier 
wrote:

> On Mon, Aug 21, 2017 at 11:30 AM, Robert Haas 
> wrote:
> > On Sun, Aug 20, 2017 at 10:49 AM, Andres Freund 
> wrote:
> >> We currently still have the guideline that code should fit into an 80
> >> character window. But an increasing amount of the code, and code
> >> submissions, don't adhere to that (e.g. copy.c, which triggered me to
> >> write this email). And I mean outside of accepted "exceptions" like
> >> error messages.  And there's less need for such a relatively tight limit
> >> these days.  Perhaps we should up the guideline to 90 or 100 chars?
> >
> > Or maybe we should go the other way and get a little more rigorous
> > about enforcing that limit.  I realize 80 has nothing on its side but
> > tradition, but I'm a traditionalist -- and I still do use 80 character
> > windows a lot of the time.
>
> +1. FWIW, I find the non-truncation of some error messages a bit
> annoying when reading them. And having a 80-character makes things
> readable. For long URLs this enforcement makes little sense as those
> strings cannot be split, but more effort could be done.
> 
>

The main argument for not wrapping error messages is grep-ableness.

Personally I'd be fine with 100 or so, but when I'm using buffers side by
side, or when I'm working in poor conditions where I've set my terminal to
"giant old people text" sizes, I remember the advantages of a width limit.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Robert Haas
On Sun, Aug 20, 2017 at 10:43 PM, Andres Freund  wrote:
> Hm. Because you're used to it, or because more of them fit on the
> screen?

Both.  It's also the default window size on my MacBook.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Andres Freund
On 2017-08-21 11:36:43 +0900, Michael Paquier wrote:
> On Mon, Aug 21, 2017 at 11:30 AM, Robert Haas  wrote:
> > On Sun, Aug 20, 2017 at 10:49 AM, Andres Freund  wrote:
> >> We currently still have the guideline that code should fit into an 80
> >> character window. But an increasing amount of the code, and code
> >> submissions, don't adhere to that (e.g. copy.c, which triggered me to
> >> write this email). And I mean outside of accepted "exceptions" like
> >> error messages.  And there's less need for such a relatively tight limit
> >> these days.  Perhaps we should up the guideline to 90 or 100 chars?
> >
> > Or maybe we should go the other way and get a little more rigorous
> > about enforcing that limit.  I realize 80 has nothing on its side but
> > tradition, but I'm a traditionalist -- and I still do use 80 character
> > windows a lot of the time.

Hm. Because you're used to it, or because more of them fit on the
screen?


> +1. FWIW, I find the non-truncation of some error messages a bit
> annoying when reading them. And having a 80-character makes things
> readable. For long URLs this enforcement makes little sense as those
> strings cannot be split, but more effort could be done.

I'm going to be very very strongly against splitting error
messages. Citus does split them and it makes grepping for them a
nuisance.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Michael Paquier
On Mon, Aug 21, 2017 at 11:30 AM, Robert Haas  wrote:
> On Sun, Aug 20, 2017 at 10:49 AM, Andres Freund  wrote:
>> We currently still have the guideline that code should fit into an 80
>> character window. But an increasing amount of the code, and code
>> submissions, don't adhere to that (e.g. copy.c, which triggered me to
>> write this email). And I mean outside of accepted "exceptions" like
>> error messages.  And there's less need for such a relatively tight limit
>> these days.  Perhaps we should up the guideline to 90 or 100 chars?
>
> Or maybe we should go the other way and get a little more rigorous
> about enforcing that limit.  I realize 80 has nothing on its side but
> tradition, but I'm a traditionalist -- and I still do use 80 character
> windows a lot of the time.

+1. FWIW, I find the non-truncation of some error messages a bit
annoying when reading them. And having a 80-character makes things
readable. For long URLs this enforcement makes little sense as those
strings cannot be split, but more effort could be done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Robert Haas
On Sun, Aug 20, 2017 at 10:49 AM, Andres Freund  wrote:
> We currently still have the guideline that code should fit into an 80
> character window. But an increasing amount of the code, and code
> submissions, don't adhere to that (e.g. copy.c, which triggered me to
> write this email). And I mean outside of accepted "exceptions" like
> error messages.  And there's less need for such a relatively tight limit
> these days.  Perhaps we should up the guideline to 90 or 100 chars?

Or maybe we should go the other way and get a little more rigorous
about enforcing that limit.  I realize 80 has nothing on its side but
tradition, but I'm a traditionalist -- and I still do use 80 character
windows a lot of the time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-20 Thread Michael Paquier
On Fri, Aug 18, 2017 at 3:35 AM, David Steele  wrote:
> This patch should be sufficient for 10/11 but will need some minor
> changes for 9.6 to remove the reference to wait_for_archive.  Note that
> this patch ignores Michael's patch [2] to create WAL history files on a
> standby as this will likely only be applied to master.

I'll just rebase as needed the other patch, this documentation update
is very important in itself, so let's not worry about that.

On Sat, Aug 19, 2017 at 7:46 AM, David Steele  wrote:
> On 8/18/17 3:00 PM, Robert Haas wrote:
>>
>> If you update the patch I'll apply it to 11 and 10.
>
> Attached is the updated patch.
>
> I didn't like the vague "there can be some issues on the server if it
> crashes during the backup" so I added a new paragraph at the appropriate
> step to give a more detailed explanation of the problem.

Thanks for the patch.

- This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ This terminates backup mode. On a primary, it also performs an automatic
+ switch to the next WAL segment.  On a standby, it is not possible to
+ automatically switch WAL segments, so you may wish to run
+ pg_switch_wal on the primary to perform a manual
+ switch. The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
[...]
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is
enabled and the
+ wait_for_archive parameter is true,
  pg_stop_backup does not return until the last segment has
  been archived.
+ On a standby, archive_mode must be always in order
+ for pg_stop_backup to wait.

This level of details is good to have. Thanks.

+ Prior to PostgreSQL 9.6, this
Markup ?

+  Note well that if the server crashes during the backup it may not be
+  possible to restart until the backup_label file has been
+  manually deleted from the PGDATA directory.
Missing a markup  here for PGDATA.
s/Note well/Note as well/, no?

- This function, when called on a primary, terminates the backup mode and
+ This function terminates backup mode and
  performs an automatic switch to the next WAL segment. The reason for the
  switch is to arrange for the last WAL segment written during the backup
- interval to be ready to archive.  When called on a standby, this function
- only terminates backup mode.  A subsequent WAL segment switch will be
- needed in order to ensure that all WAL files needed to restore the backup
- can be archived; if the primary does not have sufficient write activity
- to trigger one, pg_switch_wal should be executed on
- the primary.
+ interval to be ready to archive.
pg_stop_backup() still waits for the last WAL segment to be archived
on the primary. Perhaps we'd want to mention that?

Documentation does not state yet that the use of low-level APIs for
exclusive backups are not supported on standbys.

Now in the docs:
 If the backup process monitors and ensures that all WAL segment files
 required for the backup are successfully archived then the second
 parameter (which defaults to true) can be set to false to have
I would recommend adding some details here and mention
"wait_for_archive" instead of "second parameter". I am wondering as
well if this paragraph should be put in red with a warning or
something like that. This is really, really important to ensure
consistent backups!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-08-20 Thread Craig Ringer
On 20 August 2017 at 10:10, MauMau  wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region?  Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions.  The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query.  And they both assume that they are executed in SQL
> commands.
>

While I generally agree with Tom on this, I think there are some useful
ideas to examine.

Allow a UDF to emit multiple result sets that can then be incorporated into
a outer query. IMO it'd be fine to support this by returning a wide row of
REFCURSORs and then allow FETCH to be used in a subquery.

The UDF would need to be invoked before the rest of the query was planned,
so the planner could learn the structure of the cursor's result sets.

Or some higher level concept could be introduced, like it was for
aggregates and window functions, where one call can be made to get the
output structure and some stats estimates, and another call (or series) to
get the rows.

I guess you're going two steps further than that, seeking a more integrated
model where the plugin can generate paths and participate more actively in
planning, and where you can optionally make it the default so you don't
need a SQL function call to access it.

If you want to pursue that, I suggest you start small and go step-by-step.
Things like:

* Allow FETCH ...  to be used in subqueries with explicitly
listed output relation structure, like calling a function that returns
record

* Allow pre-execution of parts of a query that produce refcursors used in
subqueries, then finish planning the outer query once the cursor output
types are known

* A construct that can inject arbitrary virtual relations into the
namespace at parse-time, so you don't have to do the dance with refcursors.
(Like WITH).

* Construct that can supply stats estimates for the virtual relations

So try to build it in stages.

You could also potentially use the FDW interface.


> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

Why? What does this offer over the app or client tool wrapping its queries
in "SELECT cypher_graph('')" ?


> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary.  To take the query example in my first mail, which joins a
> relational table and the result of a graph query.  The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

Here, what you need is a way to define a set of virtual relations on a
per-query basis, where you can get stats estimates for the relations during
planning.

I guess what you're imagining is something more sophisticated where you're
generating some kind of sub-plan candidates, like the path model. With some
kind of interaction so the sub-planner for the other model could know to
generate a different sub-plan based on the context of the outer plan. I
have no idea how that could work. But I think you have about zero chance of
achieving what you want by going straight there. Focus on small incremental
steps, preferably ones you can find other uses for too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-20 Thread Michael Paquier
On Mon, Aug 21, 2017 at 10:18 AM, Thomas Munro
 wrote:
> On Mon, Aug 21, 2017 at 6:17 AM, Andres Freund  wrote:
>> I think it'd be a good idea to backpatch the addition of
>> TupleDescAttr(tupledesc, n) to make future backpatching easier. What do
>> others think?
>
> +1
>
> That would also provide a way for extension developers to be able to
> write code that compiles against PG11 and also earlier releases
> without having to do ugly conditional macros stuff.

Updating only tupdesc.h is harmless, so no real objection to your argument.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-20 Thread Thomas Munro
On Mon, Aug 21, 2017 at 6:17 AM, Andres Freund  wrote:
> I think it'd be a good idea to backpatch the addition of
> TupleDescAttr(tupledesc, n) to make future backpatching easier. What do
> others think?

+1

That would also provide a way for extension developers to be able to
write code that compiles against PG11 and also earlier releases
without having to do ugly conditional macros stuff.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2017-08-20 Thread Michael Paquier
On Mon, Aug 21, 2017 at 6:21 AM, Daniel Gustafsson  wrote:
>> On 19 Aug 2017, at 23:13, Thomas Munro  wrote:
>>> I guess it should have a fallback definition, though I don't know what
>>> it should be.
>>
>> Or maybe the guc should only exist if SSL_LIBRARY is defined?
>
> I think the intended use case of the GUC should drive the decision on 
> fallback.
> If the GUC isn’t supposed to be a way to figure out if the server was built
> with SSL support, then not existing in non-SSL backends is fine.  If, however,
> we want to allow using the GUC to see if the server has SSL support, then 
> there
> needs to be a “None” or similar value for that case.

Only GUCs related to debugging have their existence defined based on a
#define, so it seems to me that if Postgres is compiled without any
SSL support, this parameter should still be visible, but set to
"none".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Joshua D. Drake

On 08/20/2017 09:32 AM, Andres Freund wrote:

On 2017-08-20 09:29:39 -0700, Joshua D. Drake wrote:

On 08/20/2017 07:49 AM, Andres Freund wrote:

Hi,

We currently still have the guideline that code should fit into an 80
character window. But an increasing amount of the code, and code
submissions, don't adhere to that (e.g. copy.c, which triggered me to
write this email). And I mean outside of accepted "exceptions" like
error messages.  And there's less need for such a relatively tight limit
these days.  Perhaps we should up the guideline to 90 or 100 chars?


Considering the prominence of high resolution monitors and the fact that we
don't really review patches (outside of commentary) in email much anymore,
it seems that may even be a bit archaic. On my standard FHD screen using a
standard size font I have a line length of 210 before I wrap.


People commonly display multiple buffers side-by-side...  But leaving
that aside, longer and longer lines actually become just hard to read
and hint that statements should be broken across lines / indentation
reduced by splitting inner blocks into their own functions.


Good point. I think ~ 100 is probably a good idea.

JD



- Andres




--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Stats for triggers on partitioned tables not shown in EXPLAIN ANALYZE

2017-08-20 Thread Amit Langote
On 2017/08/19 2:09, Robert Haas wrote:
> On Fri, Aug 18, 2017 at 1:15 AM, Noah Misch  wrote:
>> The above-described topic is currently a PostgreSQL 10 open item.  Robert,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> v10 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within three calendar days of
>> this message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping v10.  Consequently, I will appreciate your 
>> efforts
>> toward speedy resolution.  Thanks.
>>
>> [1] 
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> Committed and back-patched the proposed patch.

Thanks.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-20 Thread Thomas Munro
On Mon, Aug 21, 2017 at 6:17 AM, Andres Freund  wrote:
> Pushing 0001, 0002 now.
>
> - rebased after conflicts
> - fixed a significant number of too long lines
> - removed a number of now superflous linebreaks

Thanks!  Please find attached a rebased version of the rest of the patch set.

> Thomas, prepare yourself for some hate from extension and fork authors /
> maintainers ;)

/me hides

The attached version also fixes a couple of small details you
complained about last week:

On Wed, Aug 16, 2017 at 10:06 AM, Andres Freund  wrote:
>> > +   size_t key_size;/* Size of the key 
>> > (initial bytes of entry) */
>> > +   size_t entry_size;  /* Total size of entry */
>> >
>> > Wonder if it'd make sense to say that key/entry sizes to be only
>> > minimums? That means we could increase them to be the proper aligned
>> > size?
>>
>> I don't understand.  You mean explicitly saying that there are
>> overheads?  Doesn't that go without saying?
>
> I was thinking that we could do the MAXALIGN style calculations once
> instead of repeatedly, by including them in the key and entry sizes.

I must be missing something -- where do we do it repeatedly?  The only
place we use MAXALIGN is a compile-type constant expression (see
expansion of macros ENTRY_FROM_ITEM and ITEM_FROM_ENTRY, and also in
one place AXALIGN(sizeof(dshash_table_item))).

> shared-record-typmods-v5.patchset/0004-Refactor-typcache.c-s-record-typmod-hash-table.patch
>
> + * hashTupleDesc
> + * Compute a hash value for a tuple descriptor.
> + *
> + * If two tuple descriptors would be considered equal by equalTupleDescs()
> + * then their hash value will be equal according to this function.
> + */
> +uint32
> +hashTupleDesc(TupleDesc desc)
> +{
> +   uint32  s = 0;
> +   int i;
> +
> +   for (i = 0; i < desc->natts; ++i)
> +   s = hash_combine(s, hash_uint32(TupleDescAttr(desc, 
> i)->atttypid));
> +
> +   return s;
> +}
>
> Hm, is it right not to include tdtypeid, tdtypmod, tdhasoid here?
> equalTupleDescs() does compare them...

OK, now adding natts (just for consistency), tdtypeid and tdhasoid to
be exactly like equalTupleDescs().  Note that tdtypmod is deliberately
*not* included.

> +   return hashTupleDesc(((RecordCacheEntry *) data)->tupdesc);
> ...
> +   return equalTupleDescs(((RecordCacheEntry *) a)->tupdesc,
> +  ((RecordCacheEntry *) b)-
>
> I'd rather have local vars for the casted params, but it's not
> important.

Done.

> MemSet(, 0, sizeof(ctl));
> -   ctl.keysize = REC_HASH_KEYS * sizeof(Oid);
> +   ctl.keysize = 0;/* unused */
> ctl.entrysize = sizeof(RecordCacheEntry);
>
> Hm, keysize 0? Is that right? Wouldn't it be more correct to have both
> of the same size, given dynahash includes the key size in the entry, and
> the pointer really is the key?

Done.

> shared-record-typmods-v5.patchset/0006-Introduce-a-shared-memory-record-typmod-registry.patch
>
> Hm, name & comment don't quite describe this accurately anymore.

Updated commit message.

> +extern void EnsureCurrentSession(void);
> +extern void EnsureCurrentSession(void);
>
> duplicated.

Fixed.

> +/*
> + * We want to create a DSA area to store shared state that has the same 
> extent
> + * as a session.  So far, it's only used to hold the shared record type
> + * registry.  We don't want it to have to create any DSM segments just yet in
> + * common cases, so we'll give it enough space to hold a very small
> + * SharedRecordTypmodRegistry.
> + */
> +#define SESSION_DSA_SIZE   0x3
>
> Same "extent"? Maybe lifetime?

Done.

> +
> +/*
> + * Make sure that there is a CurrentSession.
> + */
> +void EnsureCurrentSession(void)
> +{
>
> linebreak.

Fixed.

> +{
> +   if (CurrentSession == NULL)
> +   {
> +   MemoryContext old_context = 
> MemoryContextSwitchTo(TopMemoryContext);
> +
> +   CurrentSession = palloc0(sizeof(Session));
> +   MemoryContextSwitchTo(old_context);
> +   }
> +}
>
> Isn't MemoryContextAllocZero easier?

Done.

I also stopped saying "const TupleDesc" in a few places, which was a
thinko (I wanted pointer to const tupldeDesc, not const pointer to
tupleDesc...), and made sure that the shmem TupleDescs always have
tdtypmod actually set.

So as I understand it the remaining issues (aside from any
undiscovered bugs...) are:

1.  Do we like "Session", "CurrentSession" etc?  Robert seems to be
suggesting that this is likely to get in the way when we try to tackle
this area more thoroughly.  Andres is suggesting that this is a good
time to take steps in this direction.

2.  Andres didn't like what I did to DecrTupleDescRefCount, namely
allowing to run when there is no ResourceOwner.  I now see that this
is 

Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2017-08-20 Thread Daniel Gustafsson
> On 19 Aug 2017, at 23:13, Thomas Munro  wrote:
> 
> On Sun, Aug 20, 2017 at 8:10 AM, Thomas Munro
> > wrote:
>> On Fri, Aug 18, 2017 at 2:14 AM, Daniel Gustafsson  wrote:
>>> Attached is an updated set of patches, rebased on top of master, with bug 
>>> fixes
>>> and additional features missing in the first set.  While not complete 
>>> (yet), in
>>> case anyone is testing this I’d rather send a fresh batch rather than 
>>> sitting
>>> on them too long while I keep hacking at the docs.  While not every part of
>>> this rather large changeset has been touched, this includes all the patches 
>>> for
>>> completeness sake.
>> 
>> Hi,
>> 
>> +#if defined(USE_OPENSSL) || defined(USE_SECURETRANSPORT)
>> #define USE_SSL
>> +#if defined(USE_OPENSSL)
>> +#define SSL_LIBRARY "OpenSSL"
>> +#elif defined(USE_SECURETRANSPORT)
>> +#define SSL_LIBRARY "Secure Transport"
>> +#endif
>> #endif
>> 
>> If you configure with neither --with-securetransport nor
>> --with-openssl then SSL_LIBRARY finishes up undefined, and then guc.c
>> doesn't compile:
>> 
>> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement -Wendif-labels
>> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
>> -fwrapv -fexcess-precision=standard -g -O2 -I. -I.
>> -I../../../../src/include  -D_GNU_SOURCE   -c -o guc.o guc.c
>> guc.c:3309:3: error: ‘SSL_LIBRARY’ undeclared here (not in a function)
>>   SSL_LIBRARY,
>>   ^~~
>> 
>> I guess it should have a fallback definition, though I don't know what
>> it should be.
> 
> Or maybe the guc should only exist if SSL_LIBRARY is defined?

I think the intended use case of the GUC should drive the decision on fallback.
If the GUC isn’t supposed to be a way to figure out if the server was built
with SSL support, then not existing in non-SSL backends is fine.  If, however,
we want to allow using the GUC to see if the server has SSL support, then there
needs to be a “None” or similar value for that case.

Personally I think there is risk of regrets down the line if this GUC is used
for two things, but thats more of a gut feeling than scientifically studied.

Clearly there shouldn’t be a compilation error in either case, sorry about
missing that in the submission.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-20 Thread Pavel Stehule
Hi


> xpath-bugfix.patch affected only xml values containing an xml declaration
> with
> "encoding" attribute.  In UTF8 databases, this latest proposal
> (xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch.  In
> non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values
> containing non-ASCII data.  In a LATIN1 database, the following works today
> but breaks under your latest proposal:
>
>   SELECT xpath('text()', ('' || convert_from('\xc2b0', 'LATIN1') ||
> '')::xml);
>

I don't understand, why it should not work?


>
> It's acceptable to break that, since the documentation explicitly disclaims
> support for it.  xpath-bugfix.patch breaks different use cases, which are
> likewise acceptable to break.  See my 2017-08-08 review for details.
>
> We have xpath-bugfix.patch and xpath-parsing-error-fix.patch.  Both are
> equivalent under supported use cases (xpath in UTF8 databases).  Among
> non-supported use cases, they each make different things better and
> different
> things worse.  We should prefer to back-patch the version harming fewer
> applications.  I expect non-ASCII data is more common than xml declarations
> with "encoding" attribute, so xpath-bugfix.patch will harm fewer
> applications.
>
> Having said that, I now see a third option.  Condition this thread's
> patch's
> effects on GetDatabaseEncoding()==PG_UTF8.  That way, we fix supported
> cases,
> and we remain bug-compatible in unsupported cases.  I think that's better
> than
> the other options discussed so far.  If you agree, please send a patch
> based
> on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change and
> the
> two edits I described earlier.
>

I am sorry -  too long day today. Do you think some like

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 24229c2dff..9fd6f3509f 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3914,7 +3914,14 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
ArrayType *namespaces,
if (ctxt == NULL || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
"could not allocate parser context");
-   doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
+
+   /*
+* Passed XML is always in server encoding. When server encoding
+* is UTF8, we can pass this information to libxml2 to ignore
+* possible invalid encoding declaration in XML document.
+*/
+   doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
+   GetDatabaseEncoding() == PG_UTF8 ? "UTF-8" : NULL, 0);
if (doc == NULL || xmlerrcxt->err_occurred)
xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
"could not parse XML document");

This fix only UTF8 servers and has no any effect on other cases

?

Regards

Pavel



> Thanks,
> nm
>


Re: [HACKERS] pgbench tap tests & minor fixes

2017-08-20 Thread Fabien COELHO


Hello Nikolay,


I've applied the patch to the current master, and it seems that one test now
fails:


Indeed, Tom removed the -M option order constraint, so the test which 
check for that does not apply anymore.


Here is a v10 without this test.

--
Fabien.diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
index dc1367b..8255eff 100644
--- a/src/bin/pgbench/exprscan.l
+++ b/src/bin/pgbench/exprscan.l
@@ -360,6 +360,23 @@ expr_scanner_get_substring(PsqlScanState state,
 }
 
 /*
+ * get current expression line without one ending newline
+ */
+char *
+expr_scanner_chomp_substring(PsqlScanState state, int start_offset, int end_offset)
+{
+	const char *p = state->scanbuf;
+	/* chomp eol */
+	if (end_offset > start_offset && p[end_offset] == '\n')
+	{
+		end_offset--;
+		if (end_offset > start_offset && p[end_offset] == '\r')
+			end_offset--;
+	}
+	return expr_scanner_get_substring(state, start_offset, end_offset);
+}
+
+/*
  * Get the line number associated with the given string offset
  * (which must not be past the end of where we've lexed to).
  */
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ae78c7b..b72fe04 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -229,7 +229,7 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions */
+	int64		cnt;			/* number of transactions, including skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
  * and --latency-limit */
 	SimpleStats latency;
@@ -329,7 +329,7 @@ typedef struct
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
 	/* per client collected stats */
-	int64		cnt;			/* transaction count */
+	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
 } CState;
 
@@ -2045,7 +2045,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	if (INSTR_TIME_IS_ZERO(now))
 		INSTR_TIME_SET_CURRENT(now);
 	now_us = INSTR_TIME_GET_MICROSEC(now);
-	while (thread->throttle_trigger < now_us - latency_limit)
+	while (thread->throttle_trigger < now_us - latency_limit &&
+		   (nxacts <= 0 || st->cnt < nxacts)) /* with -t, do not overshoot */
 	{
 		processXactStats(thread, st, , true, agg);
 		/* next rendez-vous */
@@ -2053,6 +2054,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 		thread->throttle_trigger += wait;
 		st->txn_scheduled = thread->throttle_trigger;
 	}
+
+	if (nxacts > 0 && st->cnt >= nxacts)
+	{
+		st->state = CSTATE_FINISHED;
+		break;
+	}
 }
 
 st->state = CSTATE_THROTTLE;
@@ -2364,15 +2371,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
  */
 			case CSTATE_END_TX:
 
-/*
- * transaction finished: calculate latency and log the
- * transaction
- */
-if (progress || throttle_delay || latency_limit ||
-	per_script_stats || use_log)
-	processXactStats(thread, st, , false, agg);
-else
-	thread->stats.cnt++;
+/* transaction finished: calculate latency and do log */
+processXactStats(thread, st, , false, agg);
 
 if (is_connect)
 {
@@ -2381,7 +2381,6 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	INSTR_TIME_SET_ZERO(now);
 }
 
-++st->cnt;
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
 {
 	/* exit success */
@@ -2519,17 +2518,20 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 {
 	double		latency = 0.0,
 lag = 0.0;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+		per_script_stats || use_log;
 
-	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
-		INSTR_TIME_SET_CURRENT(*now);
-
-	if (!skipped)
+	if (detailed && !skipped)
 	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
+	/* detailed thread stats */
 	if (progress || throttle_delay || latency_limit)
 	{
 		accumStats(>stats, skipped, latency, lag);
@@ -2539,7 +2541,13 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 			thread->latency_late++;
 	}
 	else
+	{
+		/* no detailed stats, just count */
 		thread->stats.cnt++;
+	}
+
+	/* client stat is just counting */
+	st->cnt ++;
 
 	if (use_log)
 		doLog(thread, st, agg, skipped, latency, lag);
@@ -3026,8 +3034,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	PQExpBufferData word_buf;
 	int			word_offset;
 	int			offsets[MAX_ARGS];	/* offsets of argument words */
-	int			start_offset,
-end_offset;
+	int			start_offset;
 	int			lineno;
 	int			j;
 
@@ -3081,13 +3088,10 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 
 		my_command->expr = expr_parse_result;
 
-		/* Get location of the ending newline */

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-20 Thread Noah Misch
On Sun, Aug 20, 2017 at 10:54:57AM +0200, Pavel Stehule wrote:
> 2017-08-20 9:21 GMT+02:00 Noah Misch :
> > On Mon, Aug 07, 2017 at 05:10:14PM -0700, Noah Misch wrote:
> > > On Wed, Apr 05, 2017 at 10:53:39PM +0200, Pavel Stehule wrote:
> > > > One possible fix - and similar technique is used more times in xml.c 
> > > > code
> > > > .. xmlroot
> > >
> > > > +   /* remove header */
> > > > +   parse_xml_decl(string, _len, NULL, NULL, NULL);
> > >
> > > > -   doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 
> > > > 0);
> > > > +   doc = xmlCtxtReadMemory(ctxt, (char *) string + header_len, len 
> > > > -
> >
> > > I like this parse_xml_decl() approach better.  Would you update
> > > your patch to use it and to add the test case I give above?
> >
> > Again, would you make those two edits?
> 
> Now, I am not sure so it is correct fix. We will fix case, when server is
> in UTF8, but maybe we will break some cases when server is not in UTF8
> (although it is broken already).

That's right.

> I am think so correct solution is encoding to UTF8 and passing encoding
> parameter. It will works safely in all cases - and we don't need cut
> header. We should not to cut header if server encoding is not in UTF8 and
> we don't pass encoding as parameter. When we pass encoding as parameter,
> then cutting header is not necessary.

xpath-bugfix.patch affected only xml values containing an xml declaration with
"encoding" attribute.  In UTF8 databases, this latest proposal
(xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch.  In
non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values
containing non-ASCII data.  In a LATIN1 database, the following works today
but breaks under your latest proposal:

  SELECT xpath('text()', ('' || convert_from('\xc2b0', 'LATIN1') || 
'')::xml);

It's acceptable to break that, since the documentation explicitly disclaims
support for it.  xpath-bugfix.patch breaks different use cases, which are
likewise acceptable to break.  See my 2017-08-08 review for details.

We have xpath-bugfix.patch and xpath-parsing-error-fix.patch.  Both are
equivalent under supported use cases (xpath in UTF8 databases).  Among
non-supported use cases, they each make different things better and different
things worse.  We should prefer to back-patch the version harming fewer
applications.  I expect non-ASCII data is more common than xml declarations
with "encoding" attribute, so xpath-bugfix.patch will harm fewer applications.

Having said that, I now see a third option.  Condition this thread's patch's
effects on GetDatabaseEncoding()==PG_UTF8.  That way, we fix supported cases,
and we remain bug-compatible in unsupported cases.  I think that's better than
the other options discussed so far.  If you agree, please send a patch based
on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change and the
two edits I described earlier.

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-20 Thread Andres Freund
Hi,

Pushing 0001, 0002 now.

- rebased after conflicts
- fixed a significant number of too long lines
- removed a number of now superflous linebreaks

I think it'd be a good idea to backpatch the addition of
TupleDescAttr(tupledesc, n) to make future backpatching easier. What do
others think?

Thomas, prepare yourself for some hate from extension and fork authors /
maintainers ;)

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Andres Freund
On 2017-08-20 09:29:39 -0700, Joshua D. Drake wrote:
> On 08/20/2017 07:49 AM, Andres Freund wrote:
> > Hi,
> > 
> > We currently still have the guideline that code should fit into an 80
> > character window. But an increasing amount of the code, and code
> > submissions, don't adhere to that (e.g. copy.c, which triggered me to
> > write this email). And I mean outside of accepted "exceptions" like
> > error messages.  And there's less need for such a relatively tight limit
> > these days.  Perhaps we should up the guideline to 90 or 100 chars?
> 
> Considering the prominence of high resolution monitors and the fact that we
> don't really review patches (outside of commentary) in email much anymore,
> it seems that may even be a bit archaic. On my standard FHD screen using a
> standard size font I have a line length of 210 before I wrap.

People commonly display multiple buffers side-by-side...  But leaving
that aside, longer and longer lines actually become just hard to read
and hint that statements should be broken across lines / indentation
reduced by splitting inner blocks into their own functions.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updating line length guidelines

2017-08-20 Thread Joshua D. Drake

On 08/20/2017 07:49 AM, Andres Freund wrote:

Hi,

We currently still have the guideline that code should fit into an 80
character window. But an increasing amount of the code, and code
submissions, don't adhere to that (e.g. copy.c, which triggered me to
write this email). And I mean outside of accepted "exceptions" like
error messages.  And there's less need for such a relatively tight limit
these days.  Perhaps we should up the guideline to 90 or 100 chars?


Considering the prominence of high resolution monitors and the fact that 
we don't really review patches (outside of commentary) in email much 
anymore, it seems that may even be a bit archaic. On my standard FHD 
screen using a standard size font I have a line length of 210 before I wrap.


Sincerely,

JD



Greetings,

Andres Freund





--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Updating line length guidelines

2017-08-20 Thread Andres Freund
Hi,

We currently still have the guideline that code should fit into an 80
character window. But an increasing amount of the code, and code
submissions, don't adhere to that (e.g. copy.c, which triggered me to
write this email). And I mean outside of accepted "exceptions" like
error messages.  And there's less need for such a relatively tight limit
these days.  Perhaps we should up the guideline to 90 or 100 chars?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench tap tests & minor fixes

2017-08-20 Thread Nikolay Shaplov
В письме от 25 июня 2017 20:42:58 пользователь Fabien COELHO написал:
> > may be this it because of "end_offset + 1" in expr_scanner_chomp_substring
> > ? Why there is + 1 there?
> 
> Thanks for the debug! Here is a v9 which does a rebase after the
> reindentation and fixes the bad offset.
Sorry I am back here after a big pause (I hope)

I've applied the patch to the current master, and it seems that one test now 
fails:

regress_log_002_pgbench_no_server:

not ok 70 - pgbench option error: prepare after script err /(?^:query mode .* 
before any)/

#   Failed test 'pgbench option error: prepare after script err /(?^:query 
mode .* before any)/'
#   at /home/nataraj/dev/review-
pgbench/ccc/postgresql/src/bin/pgbench/../../../src/test/perl/TestLib.pm line 
369.
#   'connection to database "" failed:
# could not connect to server: No such file or directory
#   Is the server running locally and accepting
#   connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
# '
# doesn't match '(?^:query mode .* before any)'
opts=-i -S, stat=1, out=(?^:^$), err=(?^:cannot be used in initialization), 
name=pgbench option error: init vs run# Running: pgbench -i -S


-- 
Do code for fun. Can do it for money (Perl & C/C++ ~10h/week)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-20 Thread Pavel Stehule
2017-08-20 9:21 GMT+02:00 Noah Misch :

> On Sun, Aug 20, 2017 at 08:46:03AM +0200, Pavel Stehule wrote:
> > 2017-08-20 4:17 GMT+02:00 Noah Misch :
> > > On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote:
> > > > I am sending some POC  - it does support XPATH and XMLTABLE for not
> UTF8
> > > > server encoding.
> > > >
> > > > In this case, all strings should be converted to UTF8 before call
> libXML2
> > > > functions, and result should be converted back from UTF8.
> > >
> > > Adding support for xpath in non-UTF8 databases is a v11 feature
> proposal.
> > > Please start a new thread for this, and add it to the open CommitFest.
> > >
> > > In this thread, would you provide the version of your patch that I
> > > described
> > > in my 2017-08-08 post to this thread?  That's a back-patchable bug fix.
> >
> >
> > There are three issues:
> >
> > 1. processing 1byte encoding XMLs documents with encoding declaration -
> > should be fixed by ecoding_for_xmlCtxtReadMemory.patch. This patch is
> very
> > short and safe - can be apply immediately (there is regress tests)
>
> We should fix that problem, yes.  encoding_for_xmlCtxtReadMemory.patch is
> not
> the right fix; see below.
>
> > 2 encoding issues in XPath specification (and  namespaces) - because
> > multibytes chars are not usually used in tag names, this issue hit
> minimum
> > users.
> >
> > 3. encoding issues in XPath and XMLTABLE results - this is bad issue -
> the
> > function XMLTABLE will not be functional on non UTF8 databases.
> Fortunately
> > - there are less users with this encoding, but probably should be apply
> as
> > fix in 10/11 Postgres.
>
> (2) and (3) are long-documented (since commit 14180f9, 2009-06)
> limitations in
> xpath functions.  That's why I would treat an improvement as a new feature
> and
> not back-patch it.  It is tempting to fix v10 so XMLTABLE is born without
> this
> limitation, but it is too late in the release cycle.
>

I agree

>
>
> Reviewing encoding_for_xmlCtxtReadMemory.patch:
>
> On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote:
> > - doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
> NULL, 0);
> > + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
> > +
>  pg_encoding_to_char(GetDatabaseEncoding()), 0);
>
> This assumes libxml2 understands every encoding name that
> pg_encoding_to_char() can return, but it does not.  xpath-bugfix.patch was
> better.  Here are the relevant parts of my review of that patch.
>

I understand to this argument.

>
> On Mon, Aug 07, 2017 at 05:10:14PM -0700, Noah Misch wrote:
> > On Wed, Apr 05, 2017 at 10:53:39PM +0200, Pavel Stehule wrote:
> > > One possible fix - and similar technique is used more times in xml.c
> code
> > > .. xmlroot
> >
> > > +   /* remove header */
> > > +   parse_xml_decl(string, _len, NULL, NULL, NULL);
> >
> > > -   doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
> NULL, 0);
> > > +   doc = xmlCtxtReadMemory(ctxt, (char *) string + header_len,
> len -
>
> > I like this parse_xml_decl() approach better.  Would you update
> > your patch to use it and to add the test case I give above?
>
> Again, would you make those two edits?
>

Now, I am not sure so it is correct fix. We will fix case, when server is
in UTF8, but maybe we will break some cases when server is not in UTF8
(although it is broken already).

I am think so correct solution is encoding to UTF8 and passing encoding
parameter. It will works safely in all cases - and we don't need cut
header. We should not to cut header if server encoding is not in UTF8 and
we don't pass encoding as parameter. When we pass encoding as parameter,
then cutting header is not necessary.

What do you think about last patch?

Regards

Pavel



>
> Thanks,
> nm
>
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index c47624eff6..5ceda8034d 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3895,6 +3895,13 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
  errmsg("empty XPath expression")));
 
 	string = pg_xmlCharStrndup(datastr, len);
+
+	/* It does nothing when string is in UTF8 already */
+	string = pg_do_encoding_conversion(string,
+	   len,
+	   GetDatabaseEncoding(),
+	   PG_UTF8);
+
 	xpath_expr = pg_xmlCharStrndup(VARDATA_ANY(xpath_expr_text), xpath_len);
 
 	xmlerrcxt = pg_xml_init(PG_XML_STRICTNESS_ALL);
@@ -3911,7 +3918,7 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
 		if (ctxt == NULL || xmlerrcxt->err_occurred)
 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
 		"could not allocate parser context");
-		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
+		doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, "UTF-8", 0);
 		if (doc == NULL || xmlerrcxt->err_occurred)
 			xml_ereport(xmlerrcxt, ERROR, 

Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-20 Thread Noah Misch
On Sun, Aug 20, 2017 at 08:46:03AM +0200, Pavel Stehule wrote:
> 2017-08-20 4:17 GMT+02:00 Noah Misch :
> > On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote:
> > > I am sending some POC  - it does support XPATH and XMLTABLE for not UTF8
> > > server encoding.
> > >
> > > In this case, all strings should be converted to UTF8 before call libXML2
> > > functions, and result should be converted back from UTF8.
> >
> > Adding support for xpath in non-UTF8 databases is a v11 feature proposal.
> > Please start a new thread for this, and add it to the open CommitFest.
> >
> > In this thread, would you provide the version of your patch that I
> > described
> > in my 2017-08-08 post to this thread?  That's a back-patchable bug fix.
> 
> 
> There are three issues:
> 
> 1. processing 1byte encoding XMLs documents with encoding declaration -
> should be fixed by ecoding_for_xmlCtxtReadMemory.patch. This patch is very
> short and safe - can be apply immediately (there is regress tests)

We should fix that problem, yes.  encoding_for_xmlCtxtReadMemory.patch is not
the right fix; see below.

> 2 encoding issues in XPath specification (and  namespaces) - because
> multibytes chars are not usually used in tag names, this issue hit minimum
> users.
> 
> 3. encoding issues in XPath and XMLTABLE results - this is bad issue - the
> function XMLTABLE will not be functional on non UTF8 databases. Fortunately
> - there are less users with this encoding, but probably should be apply as
> fix in 10/11 Postgres.

(2) and (3) are long-documented (since commit 14180f9, 2009-06) limitations in
xpath functions.  That's why I would treat an improvement as a new feature and
not back-patch it.  It is tempting to fix v10 so XMLTABLE is born without this
limitation, but it is too late in the release cycle.


Reviewing encoding_for_xmlCtxtReadMemory.patch:

On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote:
> - doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 
> 0);
> + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
> + 
> pg_encoding_to_char(GetDatabaseEncoding()), 0);

This assumes libxml2 understands every encoding name that
pg_encoding_to_char() can return, but it does not.  xpath-bugfix.patch was
better.  Here are the relevant parts of my review of that patch.

On Mon, Aug 07, 2017 at 05:10:14PM -0700, Noah Misch wrote:
> On Wed, Apr 05, 2017 at 10:53:39PM +0200, Pavel Stehule wrote:
> > One possible fix - and similar technique is used more times in xml.c code
> > .. xmlroot
> 
> > +   /* remove header */
> > +   parse_xml_decl(string, _len, NULL, NULL, NULL);
> 
> > -   doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
> > +   doc = xmlCtxtReadMemory(ctxt, (char *) string + header_len, len -

> I like this parse_xml_decl() approach better.  Would you update
> your patch to use it and to add the test case I give above?

Again, would you make those two edits?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-08-20 Thread Pavel Stehule
2017-08-20 4:17 GMT+02:00 Noah Misch :

> On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote:
> > yes, probably libXML2 try to do check from utf8 encoding to header
> > specified encoding.
>
> Yes.  That has been the topic of this thread.
>
> > a) all values created by  xml_in iterface are in database encoding -
> input
> > string is stored without any change. xml_parse is called only due
> > validation.
> >
> > b) inside xml_parse, the input is converted to UTF8, and document is read
> > by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or
> > by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8"
> > and removed decl section.
> >
> > So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document,
> > wellformated_xml) the database encoding is not important
> >
> > c) xml_recv function does validation by xml_parse and translation to
> > database encoding.
> >
> > Now I don't see a difference between @b and @c - so my hypotheses about
> > necessity to use recv interface was wrong.
>
> Yes.  You posted, on 2017-04-05, a test case not requiring the recv
> interface.
>
> On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote:
> > I didn't find any info how to enable libXML2 XPath functions for other
> > encoding than UTF8 :( ??
>
> http://xmlsoft.org/encoding.html is the relevant authority.  To
> summarize, we
> should send only UTF8 to libxml2.
>

libxml2 encodes XML to UTF8 by self. All others should be in UTF8. I found
some references to xmlSwitchEncoding function - but I didn't find any
examples of usage - probably nobody use it. Result is in UTF8 always.


>
> On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote:
> > I am sending some POC  - it does support XPATH and XMLTABLE for not UTF8
> > server encoding.
> >
> > In this case, all strings should be converted to UTF8 before call libXML2
> > functions, and result should be converted back from UTF8.
>
> Adding support for xpath in non-UTF8 databases is a v11 feature proposal.
> Please start a new thread for this, and add it to the open CommitFest.
>
> In this thread, would you provide the version of your patch that I
> described
> in my 2017-08-08 post to this thread?  That's a back-patchable bug fix.


There are three issues:

1. processing 1byte encoding XMLs documents with encoding declaration -
should be fixed by ecoding_for_xmlCtxtReadMemory.patch. This patch is very
short and safe - can be apply immediately (there is regress tests)

2 encoding issues in XPath specification (and  namespaces) - because
multibytes chars are not usually used in tag names, this issue hit minimum
users.

3. encoding issues in XPath and XMLTABLE results - this is bad issue - the
function XMLTABLE will not be functional on non UTF8 databases. Fortunately
- there are less users with this encoding, but probably should be apply as
fix in 10/11 Postgres.



> I found some previous experiments https://marc.info/?l=pgsql-
> bugs=123407176408688
>
> https://wiki.postgresql.org/wiki/Todo#XML links to other background on
> this
> feature proposal.  See Tom Lane's review of a previous patch.  Ensure your
> patch does not have the problems he found during that review.  Do that
> before
> starting a thread for this feature.
>

good information - thank you. I'll start new thread for @2 and @3 issues -
not sure if I prepare good enough patch for next commit fest - and later
commiter can decide if will do backpatching.

Regards

Pavel



>
> Thanks,
> nm
>