date:20190317

Re: Compressed TOAST Slicing

2019-03-17 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-03-12 14:42:14 +0900, Michael Paquier wrote:
> > On Mon, Mar 11, 2019 at 08:38:56PM +, Regina Obe wrote:
> > > I tested on windows mingw64 (as of a week ago) and confirmed the
> > > patch applies cleanly and significantly faster for left, substr
> > > tests than head. 
> > 
> > int32
> > pglz_decompress(const char *source, int32 slen, char *dest,
> > -   int32 rawsize)
> > +   int32 rawsize, bool is_slice)
> 
> > The performance improvements are nice, but breaking a published API is
> > less nice particularly since some work has been done to make pglz more
> > plugabble (see 60838df9, guess how wrote that).
> 
> I don't think that should stop us from breaking the API. You've got to
> do quite low level stuff to need pglz directly, in which case such an
> API change should be the least of your problems between major versions.

Agreed, this is across a major version and I don't think it's an issue
to break the API.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Online verification of checksums

2019-03-17 Thread Stephen Frost

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> If we want to run it from the server itself, then I guess a background
> worker would be a better solution. Incidentally, that's something I've
> been toying with some time ago, see [1].

So, I'm a big fan of this idea of having a background worker that's
running and (slowly, maybe configurably) scanning through the data
directory checking for corrupted pages.  I'd certainly prefer it if that
background worker didn't fault those pages into shared buffers though,
and I don't really think it should need to even check if a given page is
currently being written out or is presently in shared buffers.
Basically, I'd think it would work just fine to have it essentially do
what I am imagining pg_checksums to do, but as a background worker.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Online verification of checksums

2019-03-17 Thread Stephen Frost

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 3/2/19 12:03 AM, Robert Haas wrote:
> > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> >  wrote:
> >> I have added a retry for this as well now, without a pg_sleep() as well.
> >> This catches around 80% of the half-reads, but a few slip through. At
> >> that point we bail out with exit(1), and the user can try again, which I
> >> think is fine?
> > 
> > Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> > robust at all.
> 
> FWIW I don't think this qualifies as torn page - i.e. it's not a full
> read with a mix of old and new data. This is partial write, most likely
> because we read the blocks one by one, and when we hit the last page
> while the table is being extended, we may only see the fist 4kB. And if
> we retry very fast, we may still see only the first 4kB.

I really still am not following why this is such an issue- we do a read,
get back 4KB, do another read, check if it's zero, and if so then we
should be able to conclude that we're at the end of the file, no?  If
we're at the end of the file and we don't have a final complete block to
run a checksum check on then it seems clear to me that the file was
being extended and it's ok to skip that block.  We could also stat the
file and keep track of where we are, to detect such an extension of the
file happening, if we wanted an additional cross-check, couldn't we?  If
we do a read and get 4KB back and then do another and get 4KB back, then
we just treat it like we would an 8KB block.  Really, as long as a
subsequent read is returning bytes then we keep going, and if it returns
zero then it's EOF.  I could maybe see a "one final read" option, but I
don't think it makes sense to have some kind of time-based delay around
this where we keep trying to read.

All of this about hacking up a way to connect to PG and lock pages in
shared buffers so that we can perform a checksum check seems really
rather ridiculous for either the extension case or the regular mid-file
torn-page case.

To be clear, I agree completely that we don't want to be reporting false
positives or "this might mean corruption!" to users running the tool,
but I haven't seen a good explaination of why this needs to involve the
server to avoid that happening.  If someone would like to point that out
to me, I'd be happy to go read about it and try to understand.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Unaccent extension python script Issue in Windows

2019-03-17 Thread Kyotaro HORIGUCHI

Hello.

At Sun, 17 Mar 2019 20:23:05 -0400, Hugh Ranalli  wrote in 

> Hi Ram,
> Thanks for doing this; I've been overestimating my ability to get to things
> over the last couple of weeks.
> 
> I've looked at the patch and have made one minor change. I had moved all
> the imports up to the top, to keep them in one place (and I think some had
> originally been used only by the Python 2 code. You added them there, but
> didn't remove them from their original positions. So I've incorporated that
> into your patch, attached as v2. I've tested this under Python 2 and 3 on
> Linux, not Windows.

Though I'm not sure the necessity of running the script on
Windows, the problem is not specific for Windows, but general one
that haven't accidentially found on non-Windows environment.

On CentOS7:
> export LANG="ja_JP.EUCJP"
> python <..snipped..>
..
> UnicodeEncodeError: 'euc_jp' codec can't encode character '\xab' in position 
> 0: illegal multibyte sequence

So this is not an issue with Windows but with python3.

The script generates identical files with the both versions of
python with the pach on Linux and Windows 7. Python3 on Windows
emits CRLF as a new line but it doesn't seem to harm. (I didn't
confirmed that due to extreme slowness of build from uncertain
reasons now..)

This patch contains irrelevant changes. The minimal required
change would be the attached. If you want refacotor the
UnicodeData reader or rearrange import sutff, it should be
separate patches.

It would be better use IOBase for Python3 especially for stdout
replacement but I didin't since it *is* working.

> Everything else looks correct. I apologise for not having replied to your
> question in the original bug report. I had intended to, but as I said,
> there's been an increase in the things I need to juggle at the moment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/unaccent/generate_unaccent_rules.py b/contrib/unaccent/generate_unaccent_rules.py
index 58b6e7deb7..0d645567b7 100644
--- a/contrib/unaccent/generate_unaccent_rules.py
+++ b/contrib/unaccent/generate_unaccent_rules.py
@@ -45,7 +45,9 @@ if sys.version_info[0] <= 2:
 # Python 2 and 3 compatible bytes call
 def bytes(source, encoding='ascii', errors='strict'):
 return source.encode(encoding=encoding, errors=errors)
+else:
 # END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
+sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
 
 import re
 import argparse
@@ -233,7 +235,8 @@ def main(args):
 charactersSet = set()
 
 # read file UnicodeData.txt
-unicodeDataFile = open(args.unicodeDataFilePath, 'r')
+unicodeDataFile = codecs.open(
+args.unicodeDataFilePath, mode='r', encoding='UTF-8')
 
 # read everything we need into memory
 for line in unicodeDataFile:

Re: Determine if FOR UPDATE or FOR SHARE was used?

2019-03-17 Thread Tom Lane

Chapman Flack  writes:
> Given a Portal, or an _SPI_plan, is there a practical way to tell whether
> it came from a query with FOR UPDATE or FOR SHARE?

In principle, you could do something like drilling down into the plan
tree to see if there's a LockRows node, but this wouldn't necessarily
be a great idea from a modularity or maintainability standpoint.

I think it would help to take two steps back and ask why you want
to know this, and what exactly is it that you want to know, anyhow.
What does it matter if there's FOR SHARE in the query?  Does it
matter if the FOR SHARE targets only some tables (and do you
care which ones?)  How would your answer change if the FOR SHARE
were buried down in a CTE subquery?  Why are you only interested
in these cases, and not INSERT/UPDATE/DELETE?

regards, tom lane

Re: Performance issue in foreign-key-aware join estimation

2019-03-17 Thread David Rowley

On Mon, 18 Mar 2019 at 14:06, David Rowley  wrote:
>
> On Mon, 18 Mar 2019 at 10:08, Tom Lane  wrote:
> > If that doesn't work (because we need the index info sooner), maybe we
> > could consider never removing ECs from eq_classes, so that their indices
> > never change, then teaching most/all of the loops over eq_classes to
> > ignore entries with non-null ec_merged.  But we probably want the indexing
> > bitmapsets to reflect canonical EC numbers, so we'd still have to do some
> > updating I fear -- or else be prepared to chase up the ec_merged links
> > when using the index bitmaps.
>
> That's probably a better solution. Perhaps we can continue to nullify
> the ec_members, then just skip eclasses with a NIL ec_members.  I
> avoided that in the patch because I was worried about what extension
> might be doing, but if you think it's okay, then I can change the
> patch.

I've modified the patch to do it this way.

The only loop over eq_classes I saw outside of equivclass.c was in
postgres_fdw.c.  This just calls eclass_useful_for_merging() on each
EC in the list. Instead of having that loop skip deleted ECs, I
changed eclass_useful_for_merging() so that it just returns false for
that case.

The only other thing I change was to create a new function named
get_common_eclass_indexes() which removes some duplicate code where we
were getting ECs common to two relations. I also made it so this
function does not allocate unnecessary Bitmapsets when the inputs are
simple relations.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

eclass_indexes_v4.patch
Description: Binary data

Re: Data-only pg_rewind, take 2

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 09:00:57PM +0800, Chris Travers wrote:
> I also added test cases and some docs.  I don't know if the docs are
> sufficient.  Feedback is appreciated.

To be honest, I don't think that this approach is a good idea per the
same reasons as mentioned the last time, as this can cause pg_rewind
to break if any newly-added folder in the data directory has
non-replaceable data which is needed at the beginning of recovery and
cannot be automatically rebuilt.  So that's one extra maintenance
burden to worry about.

Here is the reference of the last thread about the same topic:
https://www.postgresql.org/message-id/can-rpxd8y7hmojzd93hoqv6n8kpeo5cmw9gym+8jirtpifn...@mail.gmail.com
--
Michael

signature.asc
Description: PGP signature

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

From a compatibility point of view, your position actually makes
sense, at least to me and after sleeping on it as UpdateControlFile is
not static, and that there are interactions with the other local
routines to read and write the control file because of the variable
ControlFile at the top of xlog.c.  So I have kept the original
interface, being now only a wrapper of the new routine.

> Attached is an update.

Thanks, I have committed the patch after fixing a couple of things.
After considering the interface, I have switched to a single boolean
as I could not actually imagine with what kind of fancy features this
could be extended further more.  If I am wrong, let's adjust it later
on.  Here are my notes about the fixes:
- pg_resetwal got broken because the path to the control file was
incorrect.  Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.
- The previously-mentioned problem with close() in the new routine is
fixed.
- Header comments at the top of update_controlfile were a bit messed
up (s/Not/Note/, missed an "up" as well).
- pg_rewind was issuing a flush of the control file even if --no-sync
was used.
- Nit: incorrect header order in controldata_utils.c.  I have kept the
backend-only includes grouped though.
--
Michael


signature.asc
Description: PGP signature

Re: Parallel query vs smart shutdown and Postmaster death

2019-03-17 Thread Thomas Munro

On Sun, Mar 17, 2019 at 5:53 PM Arseny Sher  wrote:
> Thomas Munro  writes:
>
> > 1.  Why does pmdie()'s SIGTERM case terminate parallel workers
> > immediately?  That breaks aborts running parallel queries, so they
> > don't get to end their work normally.
> > 2.  Why are new parallel workers not allowed to be started while in
> > this state?  That hangs future parallel queries forever, so they don't
> > get to end their work normally.
> > 3.  Suppose we fix the above cases; should we do it for parallel
> > workers only (somehow), or for all bgworkers?  It's hard to say since
> > I don't know what all bgworkers do.
>
> Attached patch fixes 1 and 2. As for the 3, the only other internal
> bgworkers currently are logical replication launcher and workers, which
> arguably should be killed immediately.

Hi Arseny,

Thanks for working on this!  Yes, it seems better to move forwards
rather than backwards, and fix this properly as you have done in this
patch.

Just a thought: instead of the new hand-coded loop you added in
pmdie(), do you think it would make sense to have a new argument
"exclude_class_mask" for SignalSomeChildren()?  If we did that, I
would consider renaming the existing parameter "target" to
"include_type_mask" to make it super clear what's going on.

> > Here's an initial sketch of a
> > patch like that: it gives you "ERROR:  parallel worker failed to
> > initialize" and "HINT:  More details may be available in the server
> > log." if you try to run a parallel query.
>
> Reporting bgworkers that postmaster is never going to start is probably
> worthwhile doing anyway to prevent potential further deadlocks like
> this. However, I think there is a problem in your patch: we might be in
> post PM_RUN states due to FatalError, not because of shutdown. In this
> case, we shouldn't refuse to run bgws in the future. I would also merge
> the check into bgworker_should_start_now.

Hmm, yeah, I haven't tested or double-checked in detail yet but I
think you're probably right about all of that.

-- 
Thomas Munro
https://enterprisedb.com

RE: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-03-17 Thread Iwata, Aya

Hi Nikolay,

This patch does not apply.  Please refer to http://commitfest.cputube.org/ and 
update it.
How about separating your patch by feature or units that can be phased commit.
For example, adding assert macro at first, refactoring StdRdOptions by the 
next, etc.

So I change status to "Waiting for Author".

Regards,
Aya Iwata

Re: [HACKERS] Block level parallel vacuum

2019-03-17 Thread Masahiko Sawada

On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi  
> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada  
> > wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>

Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |head| speed_up
-+-+++--
  0 |   0 |   238.2085 |   244.7625 |   1.0275
  0 |   1 |   237.7050 |   244.7625 |   1.0297
  0 |   2 |   238.0390 |   244.7625 |   1.0282
  0 |   4 |   238.1045 |   244.7625 |   1.0280
  0 |   8 |   237.8995 |   244.7625 |   1.0288
  0 |  16 |   237.7775 |   244.7625 |   1.0294
  1 |   0 |  1328.8590 |  1334.9125 |   1.0046
  1 |   1 |  1325.9140 |  1334.9125 |   1.0068
  1 |   2 |  1333.3665 |  1334.9125 |   1.0012
  1 |   4 |  1329.5205 |  1334.9125 |   1.0041
  1 |   8 |  1334.2255 |  1334.9125 |   1.0005
  1 |  16 |  1335.1510 |  1334.9125 |   0.9998
  2 |   0 |  2426.2905 |  2427.5165 |   1.0005
  2 |   1 |  1416.0595 |  2427.5165 |   1.7143
  2 |   2 |  1411.6270 |  2427.5165 |   1.7197
  2 |   4 |  1411.6490 |  2427.5165 |   1.7196
  2 |   8 |  1410.1750 |  2427.5165 |   1.7214
  2 |  16 |  1413.4985 |  2427.5165 |   1.7174
  4 |   0 |  4622.5060 |  4619.0340 |   0.9992
  4 |   1 |  2536.8435 |  4619.0340 |   1.8208
  4 |   2 |  2548.3615 |  4619.0340 |   1.8126
  4 |   4 |  1467.9655 |  4619.0340 |   3.1466
  4 |   8 |  1486.3155 |  4619.0340 |   3.1077
  4 |  16 |  1481.7150 |  4619.0340 |   3.1174
  8 |   0 |  9039.3810 |  8990.4735 |   0.9946
  8 |   1 |  4807.5880 |  8990.4735 |   1.8701
  8 |   2 |  3786.7620 |  8990.4735 |   2.3742
  8 |   4 |  2924.2205 |  8990.4735 |   3.0745
  8 |   8 |  2684.2545 |  8990.4735 |   3.3493
  8 |  16 |  2672.9800 |  8990.4735 |   3.3635
 16 |   0 | 17821.4715 | 17740.1300 |   0.9954
 16 |   1 |  9318.3810 | 17740.1300 |   1.9038
 16 |   2 |  7260.6315 | 17740.1300 |   2.4433
 16 |   4 |  5538.5225 | 17740.1300 |   3.2030
 16 |   8 |  5368.5255 | 17740.1300 |   3.3045
 16 |  16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

Attached the updated version patches. The patches apply to the current
HEAD cleanly but the 0001 patch still changes the vacuum option to a
Node since it's under the discussion. After the direction has been
decided, I'll update the patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v17-0001-Make-vacuum-options-a-Node.patch
Description: Binary data


v17-0002-Add-parallel-option-to-VACUUM-command.patch
Description: Binary data


v17-0003-Add-paralell-P-option-to-vacuumdb-command.patch
Description: Binary data

Re: WIP: Avoid creation of the free space map for small tables

2019-03-17 Thread Amit Kapila

On Fri, Mar 15, 2019 at 3:25 PM Amit Kapila  wrote:
>
> I have committed the latest version of this patch.  I think we can
> wait for a day or two see if there is any complain from buildfarm or
> in general and then we can close this CF entry.
>

Closed this CF entry.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: why doesn't DestroyPartitionDirectory hash_destroy?

2019-03-17 Thread Amit Langote

On 2019/03/15 2:56, Robert Haas wrote:
> On Thu, Mar 14, 2019 at 1:16 PM Tom Lane  wrote:
>> Actually, now that I've absorbed a bit more about 898e5e329,
>> I don't like very much about it at all.  I think having it
>> try to hang onto pointers into the relcache is a completely
>> wrongheaded design decision, and the right way for it to work
>> is to just copy the PartitionDescs out of the relcache so that
>> they're fully owned by the PartitionDirectory.  I don't see
>> a CopyPartitionDesc function anywhere (maybe it's named something
>> else?) but it doesn't look like it'd be hard to build; most
>> of the work is in partition_bounds_copy() which does exist already.
> 
> Yeah, we could do that.  I have to admit that I don't necessarily
> understand why trying to hang onto pointers into the relcache is a bad
> idea.  It is a bit complicated, but the savings in both memory and CPU
> time seem worth pursuing.  There are a lot of users who wish we scaled
> to a million partitions rather than a hundred, and just copying
> everything all over the place all the time won't get us closer to that
> goal.
> 
> More generally, I think get_relation_info() is becoming an
> increasingly nasty piece of work.  It copies more and more stuff
> instead of just pointing to it, which is necessary mostly because it
> closes the table instead of arranging to do that at the end of query
> planning.  If it did the opposite, the refcount held by the planner
> would make it unnecessary for the PartitionDirectory to hold one, and
> I bet we could also just point to a bunch of the other stuff in this
> function rather than copying that stuff, too.  As time goes by,
> relcache entries are getting more and more complex, and the optimizer
> wants to use more and more data from them for planning purposes, but,
> probably partly because of inertia, we're clinging to an old design
> where everything has to be copied.  Every time someone gets that
> wrong, and it's happened a number of times, we yell at them and tell
> them to copy more stuff instead of thinking up a design where stuff
> doesn't need to be copied.  I think that's a mistake.  You have
> previously disagreed with that position so you probably will now, too,
> but I still think it.

+1.

>> Also, at least so far as the planner's usage is concerned, claiming
>> that we're saving something by not copying is completely bogus,
>> because if we look into set_relation_partition_info, what do we
>> find but a partition_bounds_copy call.  That wouldn't be necessary
>> if we could rely on the PartitionDirectory to own the data structure.
>> (Maybe it's not necessary today.  But given what a house of cards
>> this is, I wouldn't propose ripping it out, just moving it into
>> the PartitionDirectory code.)
> 
> Ugh, I didn't notice the partition_bounds_copy() call in that
> function.  Oops.  Given the foregoing griping, you won't be surprised
> to hear that I'd rather just remove the copying step.  However, it
> sounds like we can do that no matter whether we stick with my design
> or switch to yours, because PartitionDirectory holds a relcache refcnt
> then the pointer will be stable, and if we deep-copy the whole data
> structure then the pointer will also be stable.  Prior to the commit
> at issue, we weren't doing either of those things, so that copy was
> needed until very recently.

One of the patches I've proposed in the "speed up planning with
partitions" thread [1] gets rid of the partition_bounds_copy() call,
because: 1) I too think it's unnecessary as the PartitionBoundInfo won't
change (logically or physically) as long as we've got some lock on the
table, which we do, 2) I've seen it become a significant bottleneck as the
number of partitions crosses thousands.

Thanks,
Amit

[1] https://commitfest.postgresql.org/22/1778/

Re: why doesn't DestroyPartitionDirectory hash_destroy?

2019-03-17 Thread Amit Langote

On 2019/03/15 1:02, Robert Haas wrote:
> On Thu, Mar 14, 2019 at 3:13 AM Amit Langote
>  wrote:
>> I'm curious why DestroyPartitionDirectory doesn't do
>> hash_destroy(pdir->pdir_hash)?
> 
> What would be the point?  It's more efficient to let context teardown
> take care of it.

Yeah, I only noticed that after posting my email.

As I said in another reply, while the executor's partition directory is
set up and torn down under a dedicated memory context used for execution
(es_query_context), planner's is stuck into MessageContext.  But all of
the other stuff that planner allocates goes into it too, so maybe it's fine.

Thanks,
Amit

Re: pg_rewind : feature to rewind promoted standby is broken!

2019-03-17 Thread Michael Paquier

On Thu, Mar 14, 2019 at 09:04:41AM +0900, Michael Paquier wrote:
> Thanks for confirming, Mythun.  I'll think about the logic of this
> patch for a couple of days in the background, then I'll try to commit
> it likely at the beginning of next week.

Committed.  I have spent extra time polishing the comments to make the
filtering rules clearer when processing the source and target files,
particularly why they are useful the way they are.
--
Michael


signature.asc
Description: PGP signature

RE: Timeout parameters

2019-03-17 Thread Jamison, Kirk

On Saturday, March 16, 2019 5:40 PM (GMT+9), Fabien COELHO wrote:

> > Fabien, I was wondering whether you can apply TCP_USER_TIMEOUT patch 
> > and continue discussion about 'socket_timeout'?

> I can apply nothing, I'm just a small-time reviewer.

> Committers on the thread are Michaël-san and Robert, however I'm not sure 
> that they are very sensitive to "please apply this patch" requests: they 
> are the lone judges of their own priorities.

Regarding user timeout parameters:

Based from previous reviews of the code (it seems good) and reviewers'
comments, everyone seems to agree that user timeout parameters are
needed, so we can just waitfor the updated patch.
The author, Nagaura-san, has gotten feedback from Robert for the doc part.
So if an updated patch is posted with addressed comments, then I think we
can review it again for the final round.

---
As for socket_timeout parameters:

The use case for socket timeout parameter is that it's a stop-gap approach
to prevent applications from infinite waiting for the DB server when other
timeout parameters such as keepalives and tcp_user_timeout fail to detect
the connection error. (That's why I thought it's a network problem detector?)

The main argument here is about the security risk of allowing socket timeout
to cancel valid connections, right? Then to address that, I agree with
Tsunakawa-san to document that the value should at least be (equal? or) higher
than the other timeout parameters.
If documenting is not enough, then we can limit that within the code by making
sure that socket_timeout value must be greater than the other timeout parameters
(keepalives, tcp user timeout, statement timeout, etc.). Otherwise, 
socket_timeout
parameter should not work even if was switched on. Or is that too much 
enforcing?

Regards,
Kirk Jamison

Re: Performance issue in foreign-key-aware join estimation

2019-03-17 Thread David Rowley

Thanks for looking at this.

On Mon, 18 Mar 2019 at 10:08, Tom Lane  wrote:
> I looked at this for a little bit.  I'm on board with the basic idea
> of assigning integer indexes to the ECs and keeping bitmapsets of
> relevant EC indexes in RelOptInfos.  However ... man, is that
> delete_eclass() thing ugly.  And slow, and fragile-feeling.

Yeah. When I discovered we remove eclasses from the list I was a
little annoyed as the code was pretty simple before having to account
for that.  I'll grant you ugly and slow, but I don't quite see the
fragile part.

> I think that maybe we could improve that situation by not trying to
> maintain the RelOptInfo.eclass_indexes sets at all during the initial
> construction of the ECs, but only setting them up after we've completed
> doing that.  We already have an assumption that we know when EC merging
> is done and it's safe to make pathkeys that reference the "canonical"
> (merged) ECs, so it seems like that would be a point where we could
> build the indexing bitmapsets safely.  This hinges on not needing the
> bitmapsets till after that point, but it'd be easy to arrange some
> Asserts verifying that we don't try to use them before that.

I don't think that's really that easily workable.  Consider, for
example, when add_child_rel_equivalences() is called, this is well
after the location I think you have in mind. Probably this function
should be modified to use the indexes, but it must also update the
indexes too.

> If that doesn't work (because we need the index info sooner), maybe we
> could consider never removing ECs from eq_classes, so that their indices
> never change, then teaching most/all of the loops over eq_classes to
> ignore entries with non-null ec_merged.  But we probably want the indexing
> bitmapsets to reflect canonical EC numbers, so we'd still have to do some
> updating I fear -- or else be prepared to chase up the ec_merged links
> when using the index bitmaps.

That's probably a better solution. Perhaps we can continue to nullify
the ec_members, then just skip eclasses with a NIL ec_members.  I
avoided that in the patch because I was worried about what extension
might be doing, but if you think it's okay, then I can change the
patch.

> Stepping back a little bit, I wonder whether now is the time to rethink
> the overall EC data structure, as foreseen in this old comment:
>
>  * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
>  * problem, for which there exist better data structures than simple lists.
>  * If this code ever proves to be a bottleneck then it could be sped up ---
>  * but for now, simple is beautiful.

I've thought for a while now that it wouldn't be too hard to have
equal(), or some version of it return -1, 0, 1 to allow us to binary
search or build a binary search tree of nodes.  I imagine it wouldn't
be too hard to create a compact binary search tree inside an array
with indexes to the left and right sub-tree instead of pointers. That
would likely be a bit more cache friendly and allow simple
non-recursive traversals of the entire tree.  However, that does not
sound like something this patch should be doing.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

RE: Timeout parameters

2019-03-17 Thread Tsunakawa, Takayuki

From: mikalaike...@ibagroup.eu [mailto:mikalaike...@ibagroup.eu]
> Based on your comment it seems to me that 'socket_timeout' should be
> connected with statement_timeout. I mean that end-user should wait
> statement_timeout + 'socket_timeout' for returning control. It looks much
> more safer for me.

I'm afraid we cannot enforce that relationship programmatically, so I think the 
documentation should warn that socket_timeout be longer than statement_timeout.


Regards
Takayuki Tsunakawa

Determine if FOR UPDATE or FOR SHARE was used?

2019-03-17 Thread Chapman Flack

Hi,

Given a Portal, or an _SPI_plan, is there a practical way to tell whether
it came from a query with FOR UPDATE or FOR SHARE?

Regards,
-Chap

RE: proposal: pg_restore --convert-to-text

2019-03-17 Thread Imai, Yoshikazu

On Fri, Mar 15, 2019 at 6:20 AM, Imai, Yoshikazu wrote:
> Upon committing this, we have to care this patch break backwards
> compatibility, but I haven't seen any complaints so far. If there are
> no objections, I will set this patch to ready for committer.

Jose had set this to ready for committer. Thanks.

--
Yoshikazu Imai

Re: Unaccent extension python script Issue in Windows

2019-03-17 Thread Hugh Ranalli

Hi Ram,
Thanks for doing this; I've been overestimating my ability to get to things
over the last couple of weeks.

I've looked at the patch and have made one minor change. I had moved all
the imports up to the top, to keep them in one place (and I think some had
originally been used only by the Python 2 code. You added them there, but
didn't remove them from their original positions. So I've incorporated that
into your patch, attached as v2. I've tested this under Python 2 and 3 on
Linux, not Windows.

Everything else looks correct. I apologise for not having replied to your
question in the original bug report. I had intended to, but as I said,
there's been an increase in the things I need to juggle at the moment.

Best wishes,
Hugh



On Sat, 16 Mar 2019 at 22:58, Ramanarayana  wrote:

> Hi Hugh,
>
> I have abstracted out the windows compatibility changes from your patch to
> a new patch and tested it. Added the patch to
> https://commitfest.postgresql.org/23/
>
> Please feel free to change it if it requires any changes.
>
> Cheers
> Ram 4.0
>
diff --git a/contrib/unaccent/generate_unaccent_rules.py b/contrib/unaccent/generate_unaccent_rules.py
index 58b6e7d..7a0a96e 100644
--- a/contrib/unaccent/generate_unaccent_rules.py
+++ b/contrib/unaccent/generate_unaccent_rules.py
@@ -32,9 +32,15 @@
 # The approach is to be Python3 compatible with Python2 "backports".
 from __future__ import print_function
 from __future__ import unicode_literals
+# END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
+
+import argparse
 import codecs
+import re
 import sys
+import xml.etree.ElementTree as ET
 
+# BEGIN: Python 2/3 compatibility - remove when Python 2 compatibility dropped
 if sys.version_info[0] <= 2:
 # Encode stdout as UTF-8, so we can just print to it
 sys.stdout = codecs.getwriter('utf8')(sys.stdout)
@@ -45,12 +51,9 @@ if sys.version_info[0] <= 2:
 # Python 2 and 3 compatible bytes call
 def bytes(source, encoding='ascii', errors='strict'):
 return source.encode(encoding=encoding, errors=errors)
+else:
 # END: Python 2/3 compatibility - remove when Python 2 compatibility dropped
-
-import re
-import argparse
-import sys
-import xml.etree.ElementTree as ET
+sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
 
 # The ranges of Unicode characters that we consider to be "plain letters".
 # For now we are being conservative by including only Latin and Greek.  This
@@ -233,21 +236,22 @@ def main(args):
 charactersSet = set()
 
 # read file UnicodeData.txt
-unicodeDataFile = open(args.unicodeDataFilePath, 'r')
-
-# read everything we need into memory
-for line in unicodeDataFile:
-fields = line.split(";")
-if len(fields) > 5:
-# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt
-general_category = fields[2]
-decomposition = fields[5]
-decomposition = re.sub(decomposition_type_pattern, ' ', decomposition)
-id = int(fields[0], 16)
-combining_ids = [int(s, 16) for s in decomposition.split(" ") if s != ""]
-codepoint = Codepoint(id, general_category, combining_ids)
-table[id] = codepoint
-all.append(codepoint)
+with codecs.open(
+  args.unicodeDataFilePath, mode='r', encoding='UTF-8',
+  ) as unicodeDataFile:
+# read everything we need into memory
+for line in unicodeDataFile:
+fields = line.split(";")
+if len(fields) > 5:
+# http://www.unicode.org/reports/tr44/tr44-14.html#UnicodeData.txt
+general_category = fields[2]
+decomposition = fields[5]
+decomposition = re.sub(decomposition_type_pattern, ' ', decomposition)
+id = int(fields[0], 16)
+combining_ids = [int(s, 16) for s in decomposition.split(" ") if s != ""]
+codepoint = Codepoint(id, general_category, combining_ids)
+table[id] = codepoint
+all.append(codepoint)
 
 # walk through all the codepoints looking for interesting mappings
 for codepoint in all:

RE: proposal: pg_restore --convert-to-text

2019-03-17 Thread Imai, Yoshikazu

On Sat, Mar 16, 2019 at 10:55 PM, Euler Taveira wrote:
> > Is there no need to rewrite the Description in the Doc to state we should
> specify either -d or -f option?
> > (and also it might be better to write if -l option is given, neither
> > -d nor -f option isn't necessarily needed.)
> >
> I don't think so. The description is already there (see "pg_restore can
> operate in two modes..."). I left -l as is which means that 'pg_restore
> -l foo.dump' dumps to standard output and 'pg_restore -f - -l foo.dump'
> has the same behavior).

Ah, I understand it.

> > I think the former one looks like pretty, but which one is preffered?
> >
> I don't have a style preference but decided to change to your suggestion.
> New version attached.

I checked it. It may be a trivial matter, so thanks for taking it consideration.

--
Yoshikazu Imai

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 12:01:31PM +0100, Magnus Hagander wrote:
> LGTM.

Okay, committed.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2019-03-17 Thread Noah Misch

On Tue, Oct 02, 2018 at 10:53:40AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Oct 01, 2018 at 09:32:10PM -0400, Tom Lane wrote:
> >> FWIW, my problem with this patch is that I remain unconvinced of the basic
> >> correctness of the transform (specifically the unique-ification approach).
> >> Noah's points would be important to address if we were moving the patch
> >> towards commit, but I don't see much reason to put effort into it until
> >> we can think of a way to prove whether that works.
> 
> > Not even effort to fix the assertion failures I reported?
> 
> If it seemed relevant to the proof-of-correctness problem, I would look
> into it, but ...

I put some hours into theoretical study of the proof, and I didn't find any
holes.  When the planner removes "l0 LEFT JOIN r1", it does that because
there's one output row per l0.ctid regardless of the rows of r1.  Hence,
deduplicating on (l0.ctid) is equivalent to deduplicating on (l0.ctid,
r1.ctid).  In the bad FULL JOIN case, (l0.ctid, r1.ctid) would be sufficient
as a key, but we're out of luck because removing the join makes us have only
l0.ctid for some tuples and only r1.ctid for other tuples.

If PostgreSQL ever gets inner join removal, it would be harder to preserve
this optimization in this form.  At that point, perhaps we'd cost the path
that retains the join for the benefit of $SUBJECT.  Given the performance test
results, $SUBJECT may already need a cost-based decision on whether to use it.

Re: Rare SSL failures on eelpout

2019-03-17 Thread Tom Lane

I wrote:
> Thomas Munro  writes:
>> This was an intentional change in TLS1.3, reducing round trips by
>> verifying the client certificate later.

> Ugh.  So probably we can reproduce it elsewhere if we use cutting-edge
> OpenSSL versions.

I installed OpenSSL 1.1.1a on my Mac laptop.  I got through 100 cycles
of the ssl tests without a problem, which is not too surprising because
longfin has been running on pretty much the exact same software stack
since late November, and it has not shown the problem.  However ...
I threw in the sleep() where you advised in fe-connect.c, and kaboom!

t/001_ssltests.pl .. 67/75 
#   Failed test 'certificate authorization fails with revoked client cert: 
matches'
#   at t/001_ssltests.pl line 375.
#   'psql: server closed the connection unexpectedly
#   This probably means the server terminated abnormally
#   before or while processing the request.
# could not send startup packet: Broken pipe
# '
# doesn't match '(?^:SSL error)'
t/001_ssltests.pl .. 74/75 
#   Failed test 'intermediate client certificate is missing: matches'
#   at t/001_ssltests.pl line 411.
#   'psql: server closed the connection unexpectedly
#   This probably means the server terminated abnormally
#   before or while processing the request.
# could not send startup packet: Broken pipe
# '
# doesn't match '(?^:SSL error)'
# Looks like you failed 2 tests of 75.
t/001_ssltests.pl .. Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/75 subtests 
t/002_scram.pl . ok   

It seems quite repeatable this way.

So that confirms that it's the OpenSSL version that is critical,
and that you need a very new version to make it fail.

I shall now see about fixing it...

regards, tom lane

Re: jsonpath

2019-03-17 Thread Nikita Glukhov



On 17.03.2019 21:29, Jonathan S. Katz wrote:

On 3/17/19 1:14 PM, Alexander Korotkov wrote:

On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz  wrote:

On 3/17/19 1:02 PM, Alexander Korotkov wrote:

Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?

I just used "USING gin(col)" so jsonb_ops.

I see.  So, jsonb_ops extracts from this query only existence of
.length key.  And I can bet it exists in all (or almost all) the
documents.  Thus, optimizer thinks index might be useful, while it's
useless.  There is not much can be done while we don't have statistics
for jsonb (and access to it from GIN extract_query).  So, for now we
can just refuse from extracting only keys from jsonpath in jsonb_ops.
But I think it would be better to just document this issue.  In future
we should improve that with statistics.

That seems to make sense, especially given how I've typically stored
JSON documents in PostgreSQL. It sounds like this particular problem
would be solved appropriately with JSONB statistics.


GIN jsonb_ops extracts from query

  data @? '$.length ? (@ < 150)'

the same GIN entries as from queries

  data @? '$.length'
  data ? 'length'


If you don't want to extract entries from unsupported expressions, you can try
to use another jsonpath operator @@. Queries will also look like a bit simpler:

  data @@ '$.length < 150'
  data @@ '$.content like_regex "^Start"'
  data @@ '$.content like_regex "risk" flag "i"'

All this queries emit no GIN entries. But note that

  data @@ '$ ? (@.content == "foo").length < 150'

emits the same entries as

  data @@ '$.content == "foo"'



We already have a POC implementation of jsonb statistics that was written
2 years ago.  I rebased it onto the current master yesterday.  If it is
interesting, you can find it on my GitHub [1].  But note, that there is
no support for  jsonpath operators yet, only boolean EXISTS ?, ?|, ?&, and
CONTAINS @> operators are supported.  Also there is no docs, and it works
slowly  (a more effective storage method for statistics of individual JSON
paths is needed).

Also there is ability to calculate derived statistics of expressions like

  js -> 'x' -> 0 -> 'y'
  js #> '{x,0,y}'

using jsonb statistics for columns "js". So the selectivity of expressions

  js -> 'x' -> 0 -> 'y' = '123'
  js #> '{x,0,y}' >= '123'

also can be estimated (but these expressions can't be used by index on "js").


This topic deserves a separate discussion.  I hope, we will start the
corresponding thread for PG13 after we find a more effective way of jsonb
statistics storing.


[1] https://github.com/glukhovn/postgres/tree/jsonb_stats

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Rare SSL failures on eelpout

2019-03-17 Thread Tom Lane

Thomas Munro  writes:
> On Sun, Mar 17, 2019 at 2:00 AM Thomas Munro  wrote:
>> I opened a bug report with OpenSSL, let's see what they say:
>> https://github.com/openssl/openssl/issues/8500

> This was an intentional change in TLS1.3, reducing round trips by
> verifying the client certificate later.

Ugh.  So probably we can reproduce it elsewhere if we use cutting-edge
OpenSSL versions.

> I'm pretty sure the fix I mentioned earlier -- namely adding an ad-hoc
> call to pqHandleSendFailure() if we fail to send the start-up packet
> -- would fix eelpout's measles (though obviously wouldn't solve the
> problem for Windows given what we have learned about its TCP
> implementation).  I should probably go and do that, unless you want to
> write the more general handling for send failure you described, and
> are prepared to back-patch it?

Well, I'm not sure about the back-patching aspect of that, but maybe
I should write the patch and then we should see how risky it looks.
Give me a few days ...

regards, tom lane

Re: Rare SSL failures on eelpout

2019-03-17 Thread Thomas Munro

On Sun, Mar 17, 2019 at 2:00 AM Thomas Munro  wrote:
> On Wed, Mar 6, 2019 at 9:21 AM Tom Lane  wrote:
> > Yeah, I've still been unable to reproduce even with the sleep idea,
> > so eelpout is definitely looking like a special snowflake from here.
> > In any case, there seems little doubt that getting past SSL_connect()
> > when the cert check has failed is an OpenSSL bug; I don't feel a need
> > to create a workaround for it.
>
> I was wrong: it breaks on an x86 system for me too (either with the
> sleep, or with clunky scheduling, I was running psql under strace when
> I saw it).  Not sure what I did wrong last time I tried that.  I
> opened a bug report with OpenSSL, let's see what they say:
>
> https://github.com/openssl/openssl/issues/8500

This was an intentional change in TLS1.3, reducing round trips by
verifying the client certificate later.

I'm pretty sure the fix I mentioned earlier -- namely adding an ad-hoc
call to pqHandleSendFailure() if we fail to send the start-up packet
-- would fix eelpout's measles (though obviously wouldn't solve the
problem for Windows given what we have learned about its TCP
implementation).  I should probably go and do that, unless you want to
write the more general handling for send failure you described, and
are prepared to back-patch it?

-- 
Thomas Munro
https://enterprisedb.com

Re: Fix XML handling with DOCTYPE

2019-03-17 Thread Chapman Flack

On 03/17/19 15:06, Tom Lane wrote:
> The error message issue is indeed a concern, but I don't see why it's
> complicated if you agree that
> 
>> If the query asked for CONTENT, any error result should be one you could get
>> when parsing as CONTENT.
> 
> That just requires us to save the first error message and be sure to issue
> that one not the DOCUMENT one, no?

I confess I haven't looked hard yet at how to do that. The way errors come
out of libxml is more involved than "here's a message", so there's a choice
of (a) try to copy off that struct in a way that's sure to survive
re-executing the parser, and then use the copy, or (b) generate a message
right away from the structured information and save that, and I guess b
might not be too bad; a might not be too bad, or it might, and slide right
back into the kind of libxml-behavior-assumptions you're wanting to avoid.

Meanwhile, here is a patch on the lines I proposed earlier, with a
pre-check. Any performance hit that it could entail (which I'd really
expect to be de minimis, though I haven't benchmarked) ought to be
compensated by the strlen I changed to strnlen in parse_xml_decl (as
there's really no need to run off and count the whole rest of the input
just to know if 1, 2, 3, or 4 bytes are available to decode a UTF-8 char).

... and, yes, I know that could be an independent patch, and then the
performance effect here should be measured from there. But it was near
what I was doing anyway, so I included it here.

Attaching both still-outstanding patches (this one and docfix) so the
CF app doesn't lose one.

Regards,
-Chap
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index b462c06..94b46a6 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4282,16 +4282,21 @@ SET XML OPTION { DOCUMENT | CONTENT };
 SET xmloption TO { DOCUMENT | CONTENT };
 
 The default is CONTENT, so all forms of XML
-data are allowed.
+data are allowed except as noted below.

 

 
- With the default XML option setting, you cannot directly cast
- character strings to type xml if they contain a
- document type declaration, because the definition of XML content
- fragment does not accept them.  If you need to do that, either
- use XMLPARSE or change the XML option.
+ In the SQL:2006 and later standard, the CONTENT form
+ is a proper superset of the DOCUMENT form, and so the
+ default XML option setting would allow casting to XML from character
+ strings in either form. PostgreSQL, however,
+ uses the SQL:2003 definition in which CONTENT form
+ cannot contain a document type declaration. Therefore, there is no one
+ setting of the XML option that will allow casting to XML from every valid
+ character string. The default will work almost always, but for a document
+ with a DTD, it will be necessary to change the XML option or
+ use XMLPARSE specifying DOCUMENT.
 

 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 03859a7..0017aab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10140,8 +10140,13 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 
 
  
+
   XML Functions
 
+  
+   XML Functions
+  
+
   
The functions and function-like expressions described in this
section operate on values of type xml.  Check .  The particular behavior for
  individual data types is expected to evolve in order to align the
- SQL and PostgreSQL data types with the XML Schema specification,
- at which point a more precise description will appear.
+ PostgreSQL mappings with those specified in SQL:2006 and later,
+ as discussed in .
 

 
@@ -10587,10 +10592,12 @@ SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
 
 
 
- The function xmlexists returns true if the
- XPath expression in the first argument returns any nodes, and
- false otherwise.  (If either argument is null, the result is
- null.)
+ The function xmlexists evaluates an XPath 1.0
+ expression (the first argument), with the passed value as its context item.
+ The function returns false if the result of that evaluation yields an
+ empty nodeset, true if it yields any other value.  (The function returns
+ null if an argument is null.) A nonnull value passed as the context item
+ must be an XML document, not a content fragment or any non-XML value.
 
 
 
@@ -10607,24 +10614,12 @@ SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY VALUE 'T
 
 
  The BY REF or BY VALUE clauses
- have no effect in PostgreSQL, but are allowed
- for compatibility with other implementations.  Per the SQL
- standard, the one that precedes any argument is required, and indicates
- the default for arguments that follow, and one may follow any argument to
- override the default.
- PostgreSQL ignores BY REF
- and

Re: Performance issue in foreign-key-aware join estimation

2019-03-17 Thread Tom Lane

David Rowley  writes:
> [ eclass_indexes_v3.patch ]

I looked at this for a little bit.  I'm on board with the basic idea
of assigning integer indexes to the ECs and keeping bitmapsets of
relevant EC indexes in RelOptInfos.  However ... man, is that
delete_eclass() thing ugly.  And slow, and fragile-feeling.

I think that maybe we could improve that situation by not trying to
maintain the RelOptInfo.eclass_indexes sets at all during the initial
construction of the ECs, but only setting them up after we've completed
doing that.  We already have an assumption that we know when EC merging
is done and it's safe to make pathkeys that reference the "canonical"
(merged) ECs, so it seems like that would be a point where we could
build the indexing bitmapsets safely.  This hinges on not needing the
bitmapsets till after that point, but it'd be easy to arrange some
Asserts verifying that we don't try to use them before that.

If that doesn't work (because we need the index info sooner), maybe we
could consider never removing ECs from eq_classes, so that their indices
never change, then teaching most/all of the loops over eq_classes to
ignore entries with non-null ec_merged.  But we probably want the indexing
bitmapsets to reflect canonical EC numbers, so we'd still have to do some
updating I fear -- or else be prepared to chase up the ec_merged links
when using the index bitmaps.

Stepping back a little bit, I wonder whether now is the time to rethink
the overall EC data structure, as foreseen in this old comment:

 * Note: constructing merged EquivalenceClasses is a standard UNION-FIND
 * problem, for which there exist better data structures than simple lists.
 * If this code ever proves to be a bottleneck then it could be sped up ---
 * but for now, simple is beautiful.

I'm not very sure what a better data structure would really look like ---
after perusing Sedgewick's section on UNION-FIND, it seems like the
ec_merged links are more or less what he's recommending anyway, and the
expensive part of this is probably all the equal() tests to find whether
two proposed EC member expressions are already in the set of known
expressions.  So I'm just handwaving here.  But my point is that we
needn't feel wedded to the idea of keeping the ECs in a list, if we can
think of some better data structure for them.

regards, tom lane

Re: CREATE OR REPLACE AGGREGATE?

2019-03-17 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 Tom> Yeah, it seems like mostly a lack-of-round-tuits problem.

 Tom> Updating the aggregate's dependencies correctly might be a bit
 Tom> tricky, but it can't be any worse than the corresponding problem
 Tom> for functions...

I was worried about that myself but looking at it, unless I overlooked
something, it's not hard to deal with. The main thing is that all the
dependencies attach to the pg_proc entry, not the pg_aggregate row
(which has no oid anyway), and ProcedureCreate when replacing that will
delete all of the old dependency entries. So all that AggregateCreate
ends up having to do is to create the same set of dependency entries
that it would have created anyway.

Here's my initial draft patch (includes docs but not tests yet) - I have
more testing to do on it, particularly to check the dependencies are
right, but so far it seems to work.

-- 
Andrew (irc:RhodiumToad)

diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index b8cd2e7af9..7530b6770b 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
+CREATE [ OR REPLACE ] AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
 SFUNC = sfunc,
 STYPE = state_data_type
 [ , SSPACE = state_data_size ]
@@ -44,7 +44,7 @@ CREATE AGGREGATE name ( [ name ( [ [ argmode ] [ argname ] arg_data_type [ , ... ] ]
+CREATE [ OR REPLACE ] AGGREGATE name ( [ [ argmode ] [ argname ] arg_data_type [ , ... ] ]
 ORDER BY [ argmode ] [ argname ] arg_data_type [ , ... ] ) (
 SFUNC = sfunc,
 STYPE = state_data_type
@@ -59,7 +59,7 @@ CREATE AGGREGATE name ( [ [ or the old syntax
 
-CREATE AGGREGATE name (
+CREATE [ OR REPLACE ] AGGREGATE name (
 BASETYPE = base_type,
 SFUNC = sfunc,
 STYPE = state_data_type
@@ -88,14 +88,23 @@ CREATE AGGREGATE name (
   Description
 
   
-   CREATE AGGREGATE defines a new aggregate
-   function. Some basic and commonly-used aggregate functions are
-   included with the distribution; they are documented in . If one defines new types or needs
-   an aggregate function not already provided, then CREATE
-   AGGREGATE can be used to provide the desired features.
+   CREATE AGGREGATE defines a new aggregate function.
+   CREATE OR REPLACE AGGREGATE will either define a new
+   aggregate function or replace an existing definition. Some basic and
+   commonly-used aggregate functions are included with the distribution; they
+   are documented in . If one defines new
+   types or needs an aggregate function not already provided, then
+   CREATE AGGREGATE can be used to provide the desired
+   features.
   
 
+  
+   When replacing an existing definition, the argument types, result type,
+   and number of direct arguments may not be changed. Also, the new definition
+   must be of the same kind (ordinary aggregate, ordered-set aggregate, or
+   hypothetical-set aggregate) as the old one.
+  
+  
   
If a schema name is given (for example, CREATE AGGREGATE
myschema.myagg ...) then the aggregate function is created in the
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 19e3171bf7..cdc8d9453d 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -45,6 +45,7 @@ static Oid lookup_agg_function(List *fnName, int nargs, Oid *input_types,
 ObjectAddress
 AggregateCreate(const char *aggName,
 Oid aggNamespace,
+bool replace,
 char aggKind,
 int numArgs,
 int numDirectArgs,
@@ -77,8 +78,10 @@ AggregateCreate(const char *aggName,
 {
 	Relation	aggdesc;
 	HeapTuple	tup;
+	HeapTuple	oldtup;
 	bool		nulls[Natts_pg_aggregate];
 	Datum		values[Natts_pg_aggregate];
+	bool		replaces[Natts_pg_aggregate];
 	Form_pg_proc proc;
 	Oid			transfn;
 	Oid			finalfn = InvalidOid;	/* can be omitted */
@@ -609,7 +612,7 @@ AggregateCreate(const char *aggName,
 
 	myself = ProcedureCreate(aggName,
 			 aggNamespace,
-			 false, /* no replacement */
+			 replace, /* maybe replacement */
 			 false, /* doesn't return a set */
 			 finaltype, /* returnType */
 			 GetUserId(),	/* proowner */
@@ -648,6 +651,7 @@ AggregateCreate(const char *aggName,
 	{
 		nulls[i] = false;
 		values[i] = (Datum) NULL;
+		replaces[i] = true;
 	}
 	values[Anum_pg_aggregate_aggfnoid - 1] = ObjectIdGetDatum(procOid);
 	values[Anum_pg_aggregate_aggkind - 1] = CharGetDatum(aggKind);
@@ -678,8 +682,51 @@ AggregateCreate(const char *aggName,
 	else
 		nulls[Anum_pg_aggregate_aggminitval - 1] = true;
 
-	tup = heap_form_tuple(tupDesc, values, nulls);
-	CatalogTupleInsert(aggdesc, tup);
+	if (replace)
+		oldtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(procOid));
+	else
+		oldtup = NULL;
+
+	if (HeapTupleIsValid(oldtup))
+	{
+		Form_pg_aggregate oldagg =

Re: Unduly short fuse in RequestCheckpoint

2019-03-17 Thread Tom Lane

I wrote:
> The cause of this error is that RequestCheckpoint will give up and fail
> after just 2 seconds, which evidently is not long enough on slow or
> heavily loaded machines.  Since there isn't any good reason why the
> checkpointer wouldn't be running, I'm inclined to swing a large hammer
> and kick this timeout up to 60 seconds.  Thoughts?

So I had imagined this as about a 2-line patch, s/2/60/g and be done.
Looking closer, though, there's other pre-existing problems in this code:

1. As it's currently coded, the requesting process can wait for up to 2
seconds for the checkpointer to start *even if the caller did not say
CHECKPOINT_WAIT*.  That seems a little bogus, and an unwanted 60-second
wait would be a lot bogus.

2. If the timeout does elapse, and we didn't have the CHECKPOINT_WAIT
flag, we just log the failure and return.  When the checkpointer
ultimately does start, it will have no idea that it should set to work
right away on a checkpoint.  (I wonder if this accounts for any other
of the irreproducible buildfarm failures we get on slow machines.  From
the calling code's viewpoint, it'd seem like it was taking a darn long
time to perform a successfully-requested checkpoint.  Given that most
checkpoint requests are non-WAIT, this seems not very nice.)

After some thought I came up with the attached proposed patch.  The
basic idea here is that we record a checkpoint request by ensuring
that the shared-memory ckpt_flags word is nonzero.  (It's not clear
to me that a valid request would always have at least one of the
existing flag bits set, so I just added an extra always-set bit to
guarantee this.)  Then, whether the signal gets sent or not, there is
a persistent record of the request in shmem, which the checkpointer
will eventually notice.  In the expected case where the problem is
that the checkpointer hasn't started just yet, it will see the flag
during its first main loop and begin a checkpoint right away.
I took out the local checkpoint_requested flag altogether.

A possible objection to this fix is that up to now, it's been possible
to trigger a checkpoint just by sending SIGINT to the checkpointer
process, without touching shmem at all.  None of the core code depends
on that, and since the checkpointer's PID is difficult to find out
from "outside", it's hard to believe that anybody's got custom tooling
that depends on it, but perhaps they do.  I thought about keeping the
checkpoint_requested flag to allow that to continue to work, but if
we do so then we have a race condition: the checkpointer could see the
shmem flag set and start a checkpoint, then receive the signal a moment
later and believe that that represents a second, independent request
requiring a second checkpoint.  So I think we should just blow off that
hypothetical possibility and do it like this.

Comments?

regards, tom lane

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3d5b382..9d0d05a 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -153,7 +153,6 @@ double		CheckPointCompletionTarget = 0.5;
  * Flags set by interrupt handlers for later service in the main loop.
  */
 static volatile sig_atomic_t got_SIGHUP = false;
-static volatile sig_atomic_t checkpoint_requested = false;
 static volatile sig_atomic_t shutdown_requested = false;

 /*
@@ -370,12 +369,6 @@ CheckpointerMain(void)
 			 */
 			UpdateSharedMemoryConfig();
 		}
-		if (checkpoint_requested)
-		{
-			checkpoint_requested = false;
-			do_checkpoint = true;
-			BgWriterStats.m_requested_checkpoints++;
-		}
 		if (shutdown_requested)
 		{
 			/*
@@ -390,6 +383,17 @@ CheckpointerMain(void)
 		}

 		/*
+		 * Detect a pending checkpoint request by checking whether the flags
+		 * word in shared memory is nonzero.  We shouldn't need to acquire the
+		 * ckpt_lck for this.
+		 */
+		if (((volatile CheckpointerShmemStruct *) CheckpointerShmem)->ckpt_flags)
+		{
+			do_checkpoint = true;
+			BgWriterStats.m_requested_checkpoints++;
+		}
+
+		/*
 		 * Force a checkpoint if too much time has elapsed since the last one.
 		 * Note that we count a timed checkpoint in stats only when this
 		 * occurs without an external request, but we set the CAUSE_TIME flag
@@ -630,17 +634,14 @@ CheckArchiveTimeout(void)
 static bool
 ImmediateCheckpointRequested(void)
 {
-	if (checkpoint_requested)
-	{
-		volatile CheckpointerShmemStruct *cps = CheckpointerShmem;
+	volatile CheckpointerShmemStruct *cps = CheckpointerShmem;

-		/*
-		 * We don't need to acquire the ckpt_lck in this case because we're
-		 * only looking at a single flag bit.
-		 */
-		if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
-			return true;
-	}
+	/*
+	 * We don't need to acquire the ckpt_lck in this case because we're only
+	 * looking at a single flag bit.
+	 */
+	if (cps->ckpt_flags & CHECKPOINT_IMMEDIATE)
+		return true;
 	return false;
 }

@@ -843,7 +844,10 @@

Re: Fix XML handling with DOCTYPE

2019-03-17 Thread Tom Lane

Chapman Flack  writes:
> On 03/17/19 13:16, Tom Lane wrote:
>> Do we need a pre-scan at all?

> Without it, we double the time to a failure result in every case that
> should actually fail, as well as in this one corner case that we want to
> see succeed, and the question you posed earlier about which error message
> to return becomes thornier.

I have absolutely zero concern about whether it takes twice as long to
detect bad input; nobody should be sending bad input if they're concerned
about performance.  (The costs of the ensuing transaction abort would
likely dwarf xml_in's runtime in any case.)  Besides, with what we're
talking about doing here,

(1) the extra runtime is consumed only in cases that would fail up to now,
so nobody can complain about a performance regression;
(2) doing a pre-scan *would* be a performance regression for cases that
work today; not a large one we hope, but still...

The error message issue is indeed a concern, but I don't see why it's
complicated if you agree that

> If the query asked for CONTENT, any error result should be one you could get
> when parsing as CONTENT.

That just requires us to save the first error message and be sure to issue
that one not the DOCUMENT one, no?  That's what we'd want to do from a
backwards-compatibility standpoint anyhow, since that's the error message
wording you'd get with today's code.

regards, tom lane

Re: jsonpath

2019-03-17 Thread Jonathan S. Katz

On 3/17/19 1:14 PM, Alexander Korotkov wrote:
> On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz  wrote:
>> On 3/17/19 1:02 PM, Alexander Korotkov wrote:
>>>
>>> Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?
>>
>> I just used "USING gin(col)" so jsonb_ops.
> 
> I see.  So, jsonb_ops extracts from this query only existence of
> .length key.  And I can bet it exists in all (or almost all) the
> documents.  Thus, optimizer thinks index might be useful, while it's
> useless.  There is not much can be done while we don't have statistics
> for jsonb (and access to it from GIN extract_query).  So, for now we
> can just refuse from extracting only keys from jsonpath in jsonb_ops.
> But I think it would be better to just document this issue.  In future
> we should improve that with statistics.

That seems to make sense, especially given how I've typically stored
JSON documents in PostgreSQL. It sounds like this particular problem
would be solved appropriately with JSONB statistics.

Thanks,

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: Fix XML handling with DOCTYPE

2019-03-17 Thread Chapman Flack

On 03/17/19 13:16, Tom Lane wrote:
> Chapman Flack  writes:
>> What I was doing in the patch is the reverse: parsing with the expectation
>> of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
>> element to precede a DTD, so that approach can be expected to fail fast
>> if the other branch needs to be taken.
> 
> Ah, right.  I don't have any problem with trying the CONTENT approach
> before the DOCUMENT approach rather than vice-versa.  What I was concerned
> about was adding a lot of assumptions about exactly how libxml would
> report the failure.  IMO a maximally-safe patch would just rearrange
> things we're already doing without adding new things.
> 
>> But a quick pre-scan for the same thing would have the same property,
>> without the libxml dependencies that bother you here. Watch this space.
> 
> Do we need a pre-scan at all?

Without it, we double the time to a failure result in every case that
should actually fail, as well as in this one corner case that we want to
see succeed, and the question you posed earlier about which error message
to return becomes thornier.

If the query asked for CONTENT, any error result should be one you could get
when parsing as CONTENT. If we switch and try parsing as DOCUMENT _because
the input is claiming to have the form of a DOCUMENT_, then it's defensible
to return errors explaining why it's not a DOCUMENT ... but not in the
general case of just throwing DOCUMENT at it any time CONTENT parse fails.

Regards,
-Chap

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO




You have one error at the end of update_controlfile(), where close() 
could issue a frontend-like error for the backend, calling exit() on the 
way.  That's not good. (No need to send a new patch, I'll fix it 
myself.)


Indeed. I meant to merge the "if (close(fd))", but ended merging the error 
generation as well.


--
Fabien

Re: Fix XML handling with DOCTYPE

2019-03-17 Thread Tom Lane

Chapman Flack  writes:
> What I was doing in the patch is the reverse: parsing with the expectation
> of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
> element to precede a DTD, so that approach can be expected to fail fast
> if the other branch needs to be taken.

Ah, right.  I don't have any problem with trying the CONTENT approach
before the DOCUMENT approach rather than vice-versa.  What I was concerned
about was adding a lot of assumptions about exactly how libxml would
report the failure.  IMO a maximally-safe patch would just rearrange
things we're already doing without adding new things.

> But a quick pre-scan for the same thing would have the same property,
> without the libxml dependencies that bother you here. Watch this space.

Do we need a pre-scan at all?

regards, tom lane

Re: jsonpath

2019-03-17 Thread Alexander Korotkov

On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz  wrote:
> On 3/17/19 1:02 PM, Alexander Korotkov wrote:
> >
> > Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?
>
> I just used "USING gin(col)" so jsonb_ops.

I see.  So, jsonb_ops extracts from this query only existence of
.length key.  And I can bet it exists in all (or almost all) the
documents.  Thus, optimizer thinks index might be useful, while it's
useless.  There is not much can be done while we don't have statistics
for jsonb (and access to it from GIN extract_query).  So, for now we
can just refuse from extracting only keys from jsonpath in jsonb_ops.
But I think it would be better to just document this issue.  In future
we should improve that with statistics.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Fix XML handling with DOCTYPE

2019-03-17 Thread Chapman Flack

On 03/17/19 11:45, Tom Lane wrote:
> Chapman Flack  writes:
>> On 03/16/19 17:21, Tom Lane wrote:
>>> (1) always try to parse as document.  If successful, we're done.
>>> (2) otherwise, if allowed by xmloption, try to parse using our
> 
>> What I don't like about that is that (a) the input could be
>> arbitrarily long and complex to parse (not that you can't imagine
>> a database populated with lots of short little XML snippets, but
>> at the same time, a query could quite plausibly deal in yooge ones),
>> and (b), step (1) could fail at the last byte of the input, followed
>> by total reparsing as (2).
> 
> That doesn't seem particularly likely to me: based on what's been
> said here, I'd expect parsing with the wrong expectation to usually
> fail near the start of the input.  In any case, the other patch
> also requires repeat parsing, no?  It's just doing that in a different
> set of cases.

I'll do up a version with the open-coded prescan I proposed last night.

Whether parsing with the wrong expectation is likely to fail near the
start of the input depends on both the input and the expectation. If
your expectation is DOCUMENT and the input is CONTENT, it's possible
for the determining difference to be something that follows the first
element, and a first element can be (and often is) nearly all of the input.

What I was doing in the patch is the reverse: parsing with the expectation
of CONTENT to see if a DTD gets tripped over. It isn't allowed for an
element to precede a DTD, so that approach can be expected to fail fast
if the other branch needs to be taken.

But a quick pre-scan for the same thing would have the same property,
without the libxml dependencies that bother you here. Watch this space.

Regards,
-Chap

Re: jsonpath

2019-03-17 Thread Jonathan S. Katz

On 3/17/19 1:02 PM, Alexander Korotkov wrote:
> 
> Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?

I just used "USING gin(col)" so jsonb_ops.

Thanks,

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: jsonpath

2019-03-17 Thread Alexander Korotkov

On Sun, Mar 17, 2019 at 8:00 PM Jonathan S. Katz  wrote:
> On 3/17/19 12:55 PM, Alexander Korotkov wrote:
> >
> >> However, when I did something a little more complex, like the below:
> >>
> >> SELECT count(*)
> >> FROM news_feed
> >> WHERE data @? '$.length ? (@ < 150)';
> >>
> >> SELECT count(*)
> >> FROM news_feed
> >> WHERE data @? '$.content ? (@ like_regex "^Start")';
> >>
> >> SELECT id, jsonb_path_query(data, '$.content')
> >> FROM news_feed
> >> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
> >>
> >> I would find that the index scan performed as well as the sequential
> >> scan. Additionally, on my laptop, the parallel sequential scan would
> >> beat the index scan by ~2.5x in some cases.
> >
> > Yeah, this cases are not supported.  Did optimizer automatically
> > select sequential scan in this case (if not touching enable_*
> > variables)?  It should, because optimizer understands that GIN scan
> > will be bad if extract_query method failed to extract anything.
>
> It did not - it was doing a bitmap heap scan. I have default costs
> setup. Example output from EXPLAIN ANALYZE with the index available:
>
>  Aggregate  (cost=1539.78..1539.79 rows=1 width=8) (actual
> time=270.419..270.419 rows=1 loops=1)
>->  Bitmap Heap Scan on news_feed  (cost=23.24..1538.73 rows=418
> width=0) (actual time=84.040..270.407 rows=5 loops=1)
>  Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
>  Rows Removed by Index Recheck: 418360
>  Heap Blocks: exact=28690
>  ->  Bitmap Index Scan on news_feed_data_gin_idx
> (cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788
> rows=418365 loops=1)
>Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
>  Planning Time: 0.168 ms
>  Execution Time: 271.105 ms
>
> And for arguments sake, after I dropped the index (and
> max_parallel_workers = 8):
>
>  Finalize Aggregate  (cost=30998.07..30998.08 rows=1 width=8) (actual
> time=91.062..91.062 rows=1 loops=1)
>->  Gather  (cost=30997.65..30998.06 rows=4 width=8) (actual
> time=90.892..97.739 rows=5 loops=1)
>  Workers Planned: 4
>  Workers Launched: 4
>  ->  Partial Aggregate  (cost=29997.65..29997.66 rows=1 width=8)
> (actual time=76.977..76.977 rows=1 loops=5)
>->  Parallel Seq Scan on news_feed  (cost=0.00..29997.39
> rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5)
>  Filter: (data @? '$."length"?(@ < 150)'::jsonpath)
>  Rows Removed by Filter: 83672
>  Planning Time: 0.127 ms
>  Execution Time: 97.801 ms

Thank you for the explanation.  Is it jsonb_ops or jsonb_path_ops?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

2019-03-17 Thread Jonathan S. Katz

On 3/17/19 12:55 PM, Alexander Korotkov wrote:
> 
>> However, when I did something a little more complex, like the below:
>>
>> SELECT count(*)
>> FROM news_feed
>> WHERE data @? '$.length ? (@ < 150)';
>>
>> SELECT count(*)
>> FROM news_feed
>> WHERE data @? '$.content ? (@ like_regex "^Start")';
>>
>> SELECT id, jsonb_path_query(data, '$.content')
>> FROM news_feed
>> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
>>
>> I would find that the index scan performed as well as the sequential
>> scan. Additionally, on my laptop, the parallel sequential scan would
>> beat the index scan by ~2.5x in some cases.
> 
> Yeah, this cases are not supported.  Did optimizer automatically
> select sequential scan in this case (if not touching enable_*
> variables)?  It should, because optimizer understands that GIN scan
> will be bad if extract_query method failed to extract anything.

It did not - it was doing a bitmap heap scan. I have default costs
setup. Example output from EXPLAIN ANALYZE with the index available:

 Aggregate  (cost=1539.78..1539.79 rows=1 width=8) (actual
time=270.419..270.419 rows=1 loops=1)
   ->  Bitmap Heap Scan on news_feed  (cost=23.24..1538.73 rows=418
width=0) (actual time=84.040..270.407 rows=5 loops=1)
 Recheck Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
 Rows Removed by Index Recheck: 418360
 Heap Blocks: exact=28690
 ->  Bitmap Index Scan on news_feed_data_gin_idx
(cost=0.00..23.14 rows=418 width=0) (actual time=41.788..41.788
rows=418365 loops=1)
   Index Cond: (data @? '$."length"?(@ < 150)'::jsonpath)
 Planning Time: 0.168 ms
 Execution Time: 271.105 ms

And for arguments sake, after I dropped the index (and
max_parallel_workers = 8):

 Finalize Aggregate  (cost=30998.07..30998.08 rows=1 width=8) (actual
time=91.062..91.062 rows=1 loops=1)
   ->  Gather  (cost=30997.65..30998.06 rows=4 width=8) (actual
time=90.892..97.739 rows=5 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 ->  Partial Aggregate  (cost=29997.65..29997.66 rows=1 width=8)
(actual time=76.977..76.977 rows=1 loops=5)
   ->  Parallel Seq Scan on news_feed  (cost=0.00..29997.39
rows=104 width=0) (actual time=39.736..76.964 rows=1 loops=5)
 Filter: (data @? '$."length"?(@ < 150)'::jsonpath)
 Rows Removed by Filter: 83672
 Planning Time: 0.127 ms
 Execution Time: 97.801 ms


>> Reading up on what the GIN patch does, this all makes sense: it's
>> optimized for equality, I understand there are challenges to be able to
>> handle inequality, regex exps, etc. And the cases where it really does
>> work well, it's _incredibly_ fast.
> 
> Yes, for more complex cases, we need different opclasses.  For
> instance, we can consider porting jsquery opclasses to PG 13.  And it
> become even more important to get parametrized opclasses, because we
> don't necessary want to index all the json fields in this same way.
> That's another challenge for future releases.  But what we have now is
> just support for some of jsonpathes for existing opclasses.

Yeah, that makes sense, and seems to be my recollection from the several
years of presentations I've seen on the topic ;)

>> My suggestion would be adding some additional guidance in the user
>> documentation around how GIN works with the @@ and @? operators so they
>> can understand where GIN will work very well with JSON path + their data
>> and not be surprised when other types of JSON path queries are
>> performing on par with a sequential scan (or worse than a parallel seq
>> scan).
> 
> Good point.  Will do.

Thanks!

Jonathan



signature.asc
Description: OpenPGP digital signature

Re: jsonpath

2019-03-17 Thread Alexander Korotkov

Hi!

On Sun, Mar 17, 2019 at 7:46 PM Jonathan S. Katz  wrote:
> Like Pavel, I did some basic testing of the patch (on current HEAD
> 4178d8b91c) trying out various JSON path expressions, and yes, it all
> worked. I had a brief scare while testing on 4178d8b91c where initdb was
> failing on the bootstrapping step, but after doing a thorough wipe of
> build files and my output directory, it seems to be initializing okay.
>
> I also did some testing of the GIN patch upthread, as the quickness of
> retrieval of the data using JSON path is of course important as well.

Thank you very much for testing!

> Using a schema roughly like this:
>
> CREATE TABLE news_feed (
> id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
> data jsonb NOT NULL
> );
> CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);
>
> I loaded in a data set of roughly 420,000 rows. Each row had all the
> same keys but differing values (e.g. "length" and "content" as keys)
>
> I tested a few different JSON path scenarios. Some of the index scans
> performed way better than the equivalent sequential scans, for instance:
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.length ? (@ == 200)';
>
> SELECT *
> FROM news_feed
> WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';
>
> Using the index outperformed the sequential scan (and parallel seq scan)
> by ~10-100x based on my config + laptop hardware!

Great!

> However, when I did something a little more complex, like the below:
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.length ? (@ < 150)';
>
> SELECT count(*)
> FROM news_feed
> WHERE data @? '$.content ? (@ like_regex "^Start")';
>
> SELECT id, jsonb_path_query(data, '$.content')
> FROM news_feed
> WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
>
> I would find that the index scan performed as well as the sequential
> scan. Additionally, on my laptop, the parallel sequential scan would
> beat the index scan by ~2.5x in some cases.

Yeah, this cases are not supported.  Did optimizer automatically
select sequential scan in this case (if not touching enable_*
variables)?  It should, because optimizer understands that GIN scan
will be bad if extract_query method failed to extract anything.

> Reading up on what the GIN patch does, this all makes sense: it's
> optimized for equality, I understand there are challenges to be able to
> handle inequality, regex exps, etc. And the cases where it really does
> work well, it's _incredibly_ fast.

Yes, for more complex cases, we need different opclasses.  For
instance, we can consider porting jsquery opclasses to PG 13.  And it
become even more important to get parametrized opclasses, because we
don't necessary want to index all the json fields in this same way.
That's another challenge for future releases.  But what we have now is
just support for some of jsonpathes for existing opclasses.

> My suggestion would be adding some additional guidance in the user
> documentation around how GIN works with the @@ and @? operators so they
> can understand where GIN will work very well with JSON path + their data
> and not be surprised when other types of JSON path queries are
> performing on par with a sequential scan (or worse than a parallel seq
> scan).

Good point.  Will do.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

2019-03-17 Thread Jonathan S. Katz

On 3/17/19 3:13 AM, Alexander Korotkov wrote:
> On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule  wrote:
>> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov 
>>  napsal:
>>>
>>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
>>>  wrote:
 On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
  wrote:
> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov  
> wrote:
>> Attached 36th version of the patches.
>
> Thank yo for the revision!
>
> In the attached revision following changes are made:
>
>> "unknown" refers here to ordinary three-valued logical Unknown, which is
>> represented in SQL by NULL.
>>
>> JSON path expressions return sequences of SQL/JSON items, which are 
>> defined by
>> SQL/JSON data model.  But JSON path predicates (logical expressions), 
>> which are
>> used in filters, return three-valued logical values: False, True, or 
>> Unknown.
>
>  * I've added short explanation of this to the documentation.
>  * Removed no longer present data structures from typedefs.list of the
> first patch.
>  * Moved GIN support patch to number 3.  Seems to be well-isolated and
> not very complex patch.  I propose to consider this to 12 too.  I
> added high-level comment there, commit message and made some code
> beautification.

 I think patches 1 and 2 are in committable shape (I reached Tomas
 off-list, he doesn't have more notes regarding them).  While patch 3
 requires more review.

 I'm going to push 1 and 2 if no objections.
>>>
>>> So, pushed.  Many thanks to reviewers and authors!
>>>
>>> Remaining part I'm proposing for 12 is attached.  I appreciate review of it.
>>
>>
>> I tested this patch and I didn't  find any issue - just I tested basic 
>> functionality and regress tests.
>>
>> looks well
> 
> Thank you for your feedback!

Like Pavel, I did some basic testing of the patch (on current HEAD
4178d8b91c) trying out various JSON path expressions, and yes, it all
worked. I had a brief scare while testing on 4178d8b91c where initdb was
failing on the bootstrapping step, but after doing a thorough wipe of
build files and my output directory, it seems to be initializing okay.

I also did some testing of the GIN patch upthread, as the quickness of
retrieval of the data using JSON path is of course important as well.
Using a schema roughly like this:

CREATE TABLE news_feed (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL
);
CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);

I loaded in a data set of roughly 420,000 rows. Each row had all the
same keys but differing values (e.g. "length" and "content" as keys)

I tested a few different JSON path scenarios. Some of the index scans
performed way better than the equivalent sequential scans, for instance:

SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ == 200)';

SELECT *
FROM news_feed
WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';

Using the index outperformed the sequential scan (and parallel seq scan)
by ~10-100x based on my config + laptop hardware!

However, when I did something a little more complex, like the below:

SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ < 150)';

SELECT count(*)
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "^Start")';

SELECT id, jsonb_path_query(data, '$.content')
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';

I would find that the index scan performed as well as the sequential
scan. Additionally, on my laptop, the parallel sequential scan would
beat the index scan by ~2.5x in some cases.

Reading up on what the GIN patch does, this all makes sense: it's
optimized for equality, I understand there are challenges to be able to
handle inequality, regex exps, etc. And the cases where it really does
work well, it's _incredibly_ fast.

My suggestion would be adding some additional guidance in the user
documentation around how GIN works with the @@ and @? operators so they
can understand where GIN will work very well with JSON path + their data
and not be surprised when other types of JSON path queries are
performing on par with a sequential scan (or worse than a parallel seq
scan).

Thanks,

Jonathan

signature.asc
Description: OpenPGP digital signature

Re: jsonpath

2019-03-17 Thread Alexander Korotkov

On Sun, Mar 17, 2019 at 6:03 PM Tom Lane  wrote:
> Andrew Dunstan  writes:
> > Why are we installing the jsonpath_gram.h file? It's not going to be
> > used by anything else, is it? TBH, I'm not sure I see why we're
> > installing the scanner.h file either.
>
> As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only
> for communication between jsonpath_gram.y and jsonpath_scan.l.  Maybe
> we'd be better off handling that need by compiling the .l file as part
> of the .y file, as we used to do with the core lexer and still do with
> several others (cf comments for commit 72b1e3a21); then we wouldn't
> even have to generate these files much less install them.
>
> A quick look at jsonpath_scan.c shows that it's pretty innocent of
> the tricks we've learned over the years for flex/bison portability;
> in particular I see that it's #including  before postgres.h,
> which is a no-no.  So that whole area needs more review anyway.

Yeah, I didn't review this part of patch carefully enough (and it
seems that other reviewers too).  I'm going to write a patch revising
this part in next couple of days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Fix XML handling with DOCTYPE

2019-03-17 Thread Tom Lane

Chapman Flack  writes:
> On 03/16/19 17:21, Tom Lane wrote:
>> Hm, so, maybe just
>> 
>> (1) always try to parse as document.  If successful, we're done.
>> 
>> (2) otherwise, if allowed by xmloption, try to parse using our
>> current logic for the CONTENT case.

> What I don't like about that is that (a) the input could be
> arbitrarily long and complex to parse (not that you can't imagine
> a database populated with lots of short little XML snippets, but
> at the same time, a query could quite plausibly deal in yooge ones),
> and (b), step (1) could fail at the last byte of the input, followed
> by total reparsing as (2).

That doesn't seem particularly likely to me: based on what's been
said here, I'd expect parsing with the wrong expectation to usually
fail near the start of the input.  In any case, the other patch
also requires repeat parsing, no?  It's just doing that in a different
set of cases.

The reason I'm pressing you for a simpler patch is that dump/reload
failures are pretty bad, so ideally we'd find a fix that we're
comfortable with back-patching into the released branches.
Personally I would never dare to back-patch the proposed patch:
it's too complex, so it's not real clear that it doesn't have unwanted
side-effects, and it's not at all certain that there aren't libxml
version dependencies in it.  (Maybe another committer with more
familiarity with libxml would evaluate the risks differently, but
I doubt it.)  But I think that something close to what I sketched
above would pass muster as safe-to-backpatch.

regards, tom lane

Re: jsonpath

2019-03-17 Thread Tom Lane

Andrew Dunstan  writes:
> Why are we installing the jsonpath_gram.h file? It's not going to be
> used by anything else, is it? TBH, I'm not sure I see why we're
> installing the scanner.h file either.

As near as I can see, jsonpath_gram.h and jsonpath_scanner.h exist only
for communication between jsonpath_gram.y and jsonpath_scan.l.  Maybe
we'd be better off handling that need by compiling the .l file as part
of the .y file, as we used to do with the core lexer and still do with
several others (cf comments for commit 72b1e3a21); then we wouldn't
even have to generate these files much less install them.

A quick look at jsonpath_scan.c shows that it's pretty innocent of
the tricks we've learned over the years for flex/bison portability;
in particular I see that it's #including  before postgres.h,
which is a no-no.  So that whole area needs more review anyway.

regards, tom lane

Re: Facing issue in using special characters

2019-03-17 Thread Warner, Gary, Jr

Many of us have faced character encoding issues because we are not in control 
of our input sources and made the common assumption that UTF-8 covers 
everything.

In my lab, as an example, some of our social media posts have included ZawGyi 
Burmese character sets rather than Unicode Burmese.  (Because Myanmar developed 
technology In a closed to the world environment, they made up their own 
non-standard character set which is very common still in Mobile phones.). We 
had fully tested the app with Unicode Burmese, but honestly didn’t know ZawGyi 
was even a thing that we would see in our dataset.  We’ve also had problems 
with non-Unicode word separators in Arabic.

What we’ve found to be helpful is to view the troubling code in a hex editor 
and determine what non-standard characters may be causing the problem.

It may be some data conversion is necessary before insertion. But the first 
step is knowing WHICH characters are causing the issue.

Re: CREATE OR REPLACE AGGREGATE?

2019-03-17 Thread Tom Lane

Michael Paquier  writes:
> On Sun, Mar 17, 2019 at 07:35:16AM +, Andrew Gierth wrote:
>> I took a bash at actually writing it and didn't see any obvious problems
>> (I'll post the patch in a bit). Is there some reason (other than
>> shortage of round tuits) why this might not be a good idea, or why it
>> hasn't been done before?

> Indeed.

Yeah, it seems like mostly a lack-of-round-tuits problem.

Updating the aggregate's dependencies correctly might be a bit tricky, but
it can't be any worse than the corresponding problem for functions...

regards, tom lane

Re: jsonpath

2019-03-17 Thread Andrew Dunstan



On 3/17/19 4:03 AM, Alexander Korotkov wrote:
> On Sat, Mar 16, 2019 at 9:12 PM Tom Lane  wrote:
>> I wrote:
>>> Good point.  I also see jsonpath_gram.h left behind after maintainer-clean:
>> Oh, and of potentially more significance: after maintainer-clean and
>> re-configure, make fails with
>>
>> In file included from jsonpath_gram.y:24:
>> ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: 
>> utils/jsonpath_gram.h: No such file or directory
>>
>> I first thought this was a problem with insufficient dependencies
>> allowing parallel make to do things in the wrong order, but the problem
>> repeats even without any parallelism.  It looks like the dependencies
>> have been constructed in such a way that if the symlink at
>> src/include/utils/jsonpath_gram.h exists but the underlying file
>> does not, nothing will make the underlying file.  This is pretty broken;
>> aside from this outright failure, it also suggests that nothing will
>> update that file if it exists but is out of date relative to its
>> sources.
>>
>> Please make sure that the make rules associated with these files look
>> exactly like the previously-debugged rules for existing bison/flex
>> output files.  There are generally good reasons for every last bit
>> of weirdness in those.
> I've pushed a fix.  I hope I didn't forget anything.
>
> BTW, it appears that windows build scripts also needs some fixup.  I'm
> not very familiar with that.  I would be glad if somebody review the
> attached patch.




Why are we installing the jsonpath_gram.h file? It's not going to be
used by anything else, is it? TBH, I'm not sure I see why we're
installing the scanner.h file either.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Tomas Vondra

Hi,

On 3/17/19 12:47 PM, Dean Rasheed wrote:
> On Sat, 16 Mar 2019 at 23:44, Tomas Vondra  
> wrote:
>>
>>> 28). I just spotted the 1MB limit on the serialised MCV list size. I
>>> think this is going to be too limiting. For example, if the stats
>>> target is at its maximum of 1, that only leaves around 100 bytes
>>> for each item's values, which is easily exceeded. In fact, I think
>>> this approach for limiting the MCV list size isn't a good one --
>>> consider what would happen if there were lots of very large values.
>>> Would it run out of memory before getting to that test? Even if not,
>>> it would likely take an excessive amount of time.
>>>
>>
>> True. I don't have a very good argument for a specific value, or even
>> having an explicit limit at all. I've initially added it mostly as a
>> safety for development purposes, but I think you're right we can just
>> get rid of it. I don't think it'd run out of memory before hitting the
>> limit, but I haven't tried very hard (but I recall running into the 1MB
>> limit in the past).
>>
> 
> I've just been playing around a little with this and found that it
> isn't safely dealing with toasted values. For example, consider the
> following test:
> 
> create or replace function random_string(x int) returns text
> as $$
>   select substr(string_agg(md5(random()::text), ''), 1, x)
> from generate_series(1,(x+31)/32);
> $$ language sql;
> 
> drop table if exists t;
> create table t(a int, b text);
> insert into t values (1, random_string(1000));
> create statistics s (mcv) on a,b from t;
> analyse t;
> 
> select length(b), left(b,5), right(b,5) from t;
> select length(stxmcv), length((m.values::text[])[2]),
>left((m.values::text[])[2], 5), right((m.values::text[])[2],5)
>   from pg_statistic_ext, pg_mcv_list_items(stxmcv) m
>  where stxrelid = 't'::regclass;
> 
> The final query returns the following:
> 
>  length |  length  | left  | right
> +--+---+---
> 250 | 1000 | c2667 | 71492
> (1 row)
> 
> suggesting that there's something odd about the stxmcv value. Note,
> also, that it doesn't hit the 1MB limit, even though the value is much
> bigger than that.
> 
> If I then delete the value from the table, without changing the stats,
> and repeat the final query, it falls over:
> 
> delete from t where a=1;
> select length(stxmcv), length((m.values::text[])[2]),
>left((m.values::text[])[2], 5), right((m.values::text[])[2],5)
>   from pg_statistic_ext, pg_mcv_list_items(stxmcv) m
>  where stxrelid = 't'::regclass;
> 
> ERROR:  unexpected chunk number 5008 (expected 0) for toast value
> 16486 in pg_toast_16480
> 
> So I suspect it was using the toast data from the table t, although
> I've not tried to investigate further.
> 

Yes, it was using the toasted value directly. The attached patch
detoasts the value explicitly, similarly to the per-column stats, and it
also removes the 1MB limit.

> 
>>> I think this part of the patch needs a bit of a rethink. My first
>>> thought is to do something similar to what happens for per-column
>>> MCVs, and set an upper limit on the size of each value that is ever
>>> considered for inclusion in the stats (c.f. WIDTH_THRESHOLD and
>>> toowide_cnt in analyse.c). Over-wide values should be excluded early
>>> on, and it will need to track whether or not any such values were
>>> excluded, because then it wouldn't be appropriate to treat the stats
>>> as complete and keep the entire list, without calling
>>> get_mincount_for_mcv_list().
>>>
>> Which part? Serialization / deserialization? Or how we handle long
>> values when building the MCV list?
>>
> 
> I was thinking (roughly) of something like the following:
> 
> * When building the values array for the MCV list, strip out rows with
> values wider than some threshold (probably something like the
> WIDTH_THRESHOLD = 1024 from analyse.c would be reasonable).
> 
> * When building the MCV list, if some over-wide values were previously
> stripped out, always go into the get_mincount_for_mcv_list() block,
> even if nitems == ngroups (for the same reason a similar thing happens
> for per-column stats -- if some items were stripped out, we're already
> saying that not all items will go in the MCV list, and it's not safe
> to assume that the remaining items are common enough to give accurate
> estimates).
> 

Yes, that makes sense I guess.

> * In the serialisation code, remove the size limit entirely. We know
> that each value is now at most 1024 bytes, and there are at most 1
> items, and at most 8 columns, so the total size is already reasonably
> well bounded. In the worst case, it might be around 80MB, but in
> practice, it's always likely to be much much smaller than that.
> 

Yep, I've already removed the limit from the current patch.


cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Tomas Vondra




On 3/17/19 1:14 PM, Dean Rasheed wrote:
> On Sat, 16 Mar 2019 at 23:44, Tomas Vondra  
> wrote:
>>>
>>> 16). This regression test fails for me:
>>>
>>> @@ -654,11 +654,11 @@
>>>  -- check change of unrelated column type does not reset the MCV statistics
>>>  ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
>>>  SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a =
>>> 1 AND b = ''1''');
>>>   estimated | actual
>>>  ---+
>>> -50 | 50
>>> +11 | 50
>>>  (1 row)
>>>
>>> Maybe that's platform-dependent, given what you said about
>>> reltuples/relpages being reset. An easy workaround for this would be
>>> to modify this test (and perhaps the one that follows) to just query
>>> pg_statistic_ext to see if the MCV statistics have been reset.
>>>
>>
>> Ah, sorry for not explaining this bit - the failure is expected, due to
>> the reset of relpages/reltuples I mentioned. We do keep the extended
>> stats, but the relsize estimate changes a bit. It surprised me a bit,
>> and this test made the behavior apparent. The last patchset included a
>> piece that changes that - if we decide not to change this, I think we
>> can simply accept the actual output.
>>
> 
> I don't think changing the way reltuples is reset ought to be within
> the scope of this patch. There might be good reasons for it being the
> way it is. Perhaps open a discussion on a separate thread?
> 

Agreed, will do.

> As far as this test goes, how about just doing this:
> 
> -- check change of unrelated column type does not reset the MCV statistics
> ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
> SELECT stxmcv IS NOT NULL AS has_mcv
>   FROM pg_statistic_ext WHERE stxrelid = 'mcv_lists'::regclass;
> 
> -- check change of column type resets the MCV statistics
> ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
> SELECT stxmcv IS NOT NULL AS has_mcv
>   FROM pg_statistic_ext WHERE stxrelid = 'mcv_lists'::regclass;
> 

OK, that's probably the best thing we can do.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Data-only pg_rewind, take 2

2019-03-17 Thread Chris Travers

Hi;

Attached is my second attempt at the pg_rewind change which allows one to
include only a minimal set.  To my understanding, all past feedback has
been addressed.

The current patch does not change default behavior at present.  It does add
a --data-only flag which allows pg_rewind to only rewind minimal files to
work.  I believe this would generally be a good practice though maybe there
is some disagreement on that.

I have not run pg_indent because of the large number of other changes but
if that is desired at some point I can do that.

I also added test cases and some docs.  I don't know if the docs are
sufficient.  Feedback is appreciated.  This is of course submitted for v13.

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


rewind_data_only_mode.patch
Description: Binary data

Re: using index or check in ALTER TABLE SET NOT NULL

2019-03-17 Thread Sergei Kornilov

Hello

> If we don't have this for SET NOT NULL then we should either add it or
> remove the one from ATTACH PARTITION.

I proposed to lower report level to debug1 in a separate thread: 
https://www.postgresql.org/message-id/4859321552643736%40myt5-02b80404fd9e.qloud-c.yandex.net
Possible better would be link it.

regards, Sergei

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I could remove the two "catalog/" includes from pg_resetwal, I assume that
> you meant these ones.

Not exactly.  What I meant is that if you try to call directly
fsync_fname and fsync_parent_path from file_utils.h, then you get into
trouble because of xlog.h..  Sure you can remove also the ones you
removed.

> Hmmm. I just did that, but what about just a boolean? What other options
> could be required? Maybe some locking/checking?

It is already expected from the caller to properly take
ControlFileLock.  Note I tend to worry too much about the
extensibility of published APIs these days as well, so perhaps just a
boolean would be fine, please let me reconsider that after some sleep,
and it is not like the contents of this routine are going to become
much complicated either, except potentially to control the flags on
open(). :p

> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

True, this actually makes back-patching a bit easier, and there are 13
calls of UpdateControlFile().

> Attached is an update.

Thanks, I'll take a look at that tomorrow.  You have one error at the
end of update_controlfile(), where close() could issue a frontend-like
error for the backend, calling exit() on the way.  That's not good.
(No need to send a new patch, I'll fix it myself.)
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Dean Rasheed

On Sat, 16 Mar 2019 at 23:44, Tomas Vondra  wrote:
> >
> > 16). This regression test fails for me:
> >
> > @@ -654,11 +654,11 @@
> >  -- check change of unrelated column type does not reset the MCV statistics
> >  ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
> >  SELECT * FROM check_estimated_rows('SELECT * FROM mcv_lists WHERE a =
> > 1 AND b = ''1''');
> >   estimated | actual
> >  ---+
> > -50 | 50
> > +11 | 50
> >  (1 row)
> >
> > Maybe that's platform-dependent, given what you said about
> > reltuples/relpages being reset. An easy workaround for this would be
> > to modify this test (and perhaps the one that follows) to just query
> > pg_statistic_ext to see if the MCV statistics have been reset.
> >
>
> Ah, sorry for not explaining this bit - the failure is expected, due to
> the reset of relpages/reltuples I mentioned. We do keep the extended
> stats, but the relsize estimate changes a bit. It surprised me a bit,
> and this test made the behavior apparent. The last patchset included a
> piece that changes that - if we decide not to change this, I think we
> can simply accept the actual output.
>

I don't think changing the way reltuples is reset ought to be within
the scope of this patch. There might be good reasons for it being the
way it is. Perhaps open a discussion on a separate thread?

As far as this test goes, how about just doing this:

-- check change of unrelated column type does not reset the MCV statistics
ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64);
SELECT stxmcv IS NOT NULL AS has_mcv
  FROM pg_statistic_ext WHERE stxrelid = 'mcv_lists'::regclass;

-- check change of column type resets the MCV statistics
ALTER TABLE mcv_lists ALTER COLUMN c TYPE numeric;
SELECT stxmcv IS NOT NULL AS has_mcv
  FROM pg_statistic_ext WHERE stxrelid = 'mcv_lists'::regclass;

Regards,
Dean

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Dean Rasheed

On Sat, 16 Mar 2019 at 23:44, Tomas Vondra  wrote:
>
> > 28). I just spotted the 1MB limit on the serialised MCV list size. I
> > think this is going to be too limiting. For example, if the stats
> > target is at its maximum of 1, that only leaves around 100 bytes
> > for each item's values, which is easily exceeded. In fact, I think
> > this approach for limiting the MCV list size isn't a good one --
> > consider what would happen if there were lots of very large values.
> > Would it run out of memory before getting to that test? Even if not,
> > it would likely take an excessive amount of time.
> >
>
> True. I don't have a very good argument for a specific value, or even
> having an explicit limit at all. I've initially added it mostly as a
> safety for development purposes, but I think you're right we can just
> get rid of it. I don't think it'd run out of memory before hitting the
> limit, but I haven't tried very hard (but I recall running into the 1MB
> limit in the past).
>

I've just been playing around a little with this and found that it
isn't safely dealing with toasted values. For example, consider the
following test:

create or replace function random_string(x int) returns text
as $$
  select substr(string_agg(md5(random()::text), ''), 1, x)
from generate_series(1,(x+31)/32);
$$ language sql;

drop table if exists t;
create table t(a int, b text);
insert into t values (1, random_string(1000));
create statistics s (mcv) on a,b from t;
analyse t;

select length(b), left(b,5), right(b,5) from t;
select length(stxmcv), length((m.values::text[])[2]),
   left((m.values::text[])[2], 5), right((m.values::text[])[2],5)
  from pg_statistic_ext, pg_mcv_list_items(stxmcv) m
 where stxrelid = 't'::regclass;

The final query returns the following:

 length |  length  | left  | right
+--+---+---
250 | 1000 | c2667 | 71492
(1 row)

suggesting that there's something odd about the stxmcv value. Note,
also, that it doesn't hit the 1MB limit, even though the value is much
bigger than that.

If I then delete the value from the table, without changing the stats,
and repeat the final query, it falls over:

delete from t where a=1;
select length(stxmcv), length((m.values::text[])[2]),
   left((m.values::text[])[2], 5), right((m.values::text[])[2],5)
  from pg_statistic_ext, pg_mcv_list_items(stxmcv) m
 where stxrelid = 't'::regclass;

ERROR:  unexpected chunk number 5008 (expected 0) for toast value
16486 in pg_toast_16480

So I suspect it was using the toast data from the table t, although
I've not tried to investigate further.

> > I think this part of the patch needs a bit of a rethink. My first
> > thought is to do something similar to what happens for per-column
> > MCVs, and set an upper limit on the size of each value that is ever
> > considered for inclusion in the stats (c.f. WIDTH_THRESHOLD and
> > toowide_cnt in analyse.c). Over-wide values should be excluded early
> > on, and it will need to track whether or not any such values were
> > excluded, because then it wouldn't be appropriate to treat the stats
> > as complete and keep the entire list, without calling
> > get_mincount_for_mcv_list().
> >
> Which part? Serialization / deserialization? Or how we handle long
> values when building the MCV list?
>

I was thinking (roughly) of something like the following:

* When building the values array for the MCV list, strip out rows with
values wider than some threshold (probably something like the
WIDTH_THRESHOLD = 1024 from analyse.c would be reasonable).

* When building the MCV list, if some over-wide values were previously
stripped out, always go into the get_mincount_for_mcv_list() block,
even if nitems == ngroups (for the same reason a similar thing happens
for per-column stats -- if some items were stripped out, we're already
saying that not all items will go in the MCV list, and it's not safe
to assume that the remaining items are common enough to give accurate
estimates).

* In the serialisation code, remove the size limit entirely. We know
that each value is now at most 1024 bytes, and there are at most 1
items, and at most 8 columns, so the total size is already reasonably
well bounded. In the worst case, it might be around 80MB, but in
practice, it's always likely to be much much smaller than that.

Regards,
Dean

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO



Michaël-san,


The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .


I could remove the two "catalog/" includes from pg_resetwal, I assume that 
you meant these ones.



Maybe the two changes could be committed separately.


I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal.  Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.


So while your patch is on a good track, I would suggest to do the 
following things to complete it:


- Add an extra argument bits16 to update_controlfile to pass a set of 
optional flags, with NOSYNC being the only and current value.  The 
default is to flush the file.


Hmmm. I just did that, but what about just a boolean? What other options 
could be required? Maybe some locking/checking?



- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.


Done.


- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.


I was keeping that one for another patch because it touches the backend 
code, but it makes sense to do that in one go for consistency.


I kept the initial no-parameter function which calls the new one with 4 
parameters, though, because it looks more homogeneous this way in the 
backend code. This is debatable.


The version in xlog.c uses BasicOpenFile(), so we should use also that 
in update_controlfile() instead of OpenTransientFile(). As any errors 
result in a PANIC we don't care about leaking fds.


Done.

Attached is an update.

--
Fabien.diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 54d3c558c6..7f782e255c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -4757,48 +4758,7 @@ ReadControlFile(void)
 void
 UpdateControlFile(void)
 {
-	int			fd;
-
-	INIT_CRC32C(ControlFile->crc);
-	COMP_CRC32C(ControlFile->crc,
-(char *) ControlFile,
-offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile->crc);
-
-	fd = BasicOpenFile(XLOG_CONTROL_FILE,
-	   O_RDWR | PG_BINARY);
-	if (fd < 0)
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not open file \"%s\": %m", XLOG_CONTROL_FILE)));
-
-	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE);
-	if (write(fd, ControlFile, sizeof(ControlFileData)) != sizeof(ControlFileData))
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not write to file \"%s\": %m",
-		XLOG_CONTROL_FILE)));
-	}
-	pgstat_report_wait_end();
-
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE);
-	if (pg_fsync(fd) != 0)
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not fsync file \"%s\": %m",
-		XLOG_CONTROL_FILE)));
-	pgstat_report_wait_end();
-
-	if (close(fd))
-		ereport(PANIC,
-(errcode_for_file_access(),
- errmsg("could not close file \"%s\": %m",
-		XLOG_CONTROL_FILE)));
+	update_controlfile(".", XLOG_CONTROL_FILE, ControlFile, CF_OPT_NONE);
 }
 
 /*
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2af8713216..334903711e 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -49,8 +49,7 @@
 #include "access/multixact.h"
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
-#include "catalog/catversion.h"
-#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
 #include "common/fe_memutils.h"
 #include "common/file_perm.h"
 #include "common/restricted_token.h"
@@ -918,18 +917,6 @@ PrintNewControlValues(void)
 static void
 RewriteControlFile(void)
 {
-	int			fd;
-	char		buffer[PG_CONTROL_FILE_SIZE];	/* need not be aligned */
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-	 "pg_control is too large for atomic disk writes");
-

Re: using index or check in ALTER TABLE SET NOT NULL

2019-03-17 Thread David Rowley

On Thu, 14 Mar 2019 at 06:24, Robert Haas  wrote:
>
> On Wed, Mar 13, 2019 at 1:09 PM Tom Lane  wrote:
> > Sergei Kornilov  writes:
> > >> Ugh, I guess so. Or how about changing the message itself to use
> > >> INFO, like we already do in QueuePartitionConstraintValidation?
> >
> > > Fine for me. But year ago this was implemented in my patch and Tom voted 
> > > against using INFO level for such purpose: 
> > > https://www.postgresql.org/message-id/1142.1520362313%40sss.pgh.pa.us
> >
> > What I thought then was that you didn't need the message at all,
> > at any debug level.  I still think that.  It might have been useful
> > for development purposes but it does not belong in committed code.
> > INFO (making it impossible for anybody to not have the message
> > in-their-face) is right out.
>
> I find that position entirely wrong-headed.  If you think users have
> no interest in a message telling them whether their gigantic table is
> getting scanned or not, you're wrong.  Maybe you're right that
> everyone doesn't want it, but I'm positive some do.  We've had
> requests for very similar things on this very mailing list more than
> one.

If we don't have this for SET NOT NULL then we should either add it or
remove the one from ATTACH PARTITION.  I don't think we need to decide
which it is now, so I've added an item to the open items list that
this is out for debate.

https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items#Open_Issues

I think we can mark this patch as committed now as I don't think the
lack of a way to test it is likely to cause it to be reverted.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Magnus Hagander

On Sun, Mar 17, 2019 at 6:47 AM Michael Paquier  wrote:

> On Sat, Mar 16, 2019 at 09:18:34AM +0100, Fabien COELHO wrote:
> > I think it would be better to adapt the checksum computation, but this is
> > indeed non trivial because of the way the BLCKSZ constant is hardwired
> into
> > type declarations.
>
> That's actually the possibility I was pointing out upthread.  I am not
> sure that the use cases are worth the effort though.
>

It may be worthwhile, but I think we shouldn't target that for v12 --
consider it a potential improvement for upcoming version. Let's focus on
the things we have now to make sure we get those polished and applied first.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Magnus Hagander

On Sun, Mar 17, 2019 at 10:10 AM Michael Paquier 
wrote:

> On Sat, Mar 16, 2019 at 11:18:17AM +0100, Magnus Hagander wrote:
> > BLCKSZ is very much an internal term. The exposed name through
> pg_settings
> > is block_size, so I think the original was better. Combining that one
> with
> > yours into  "initialized with block size %d" etc, makes it a lot nicer.
>
> Yes, what Fabien and you say here makes sense.
>
> > The "incompatible with pg_checksums" part may be a bit redundant with the
> > commandname at the start as well, as I now realized Fabien pointed out
> > downthread. But I would suggest just cutting it and saying "%s: database
> > files are incompatible" or maybe "%s: data directory is incompatible"
> even?
>
> "Cluster" is more consistent with the surroundings.  So what about the
> attached then?
>

LGTM.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: proposal: pg_restore --convert-to-text

2019-03-17 Thread José Arthur Benetasso Villanova




On Sat, 16 Mar 2019, Euler Taveira wrote:


I think the former one looks like pretty, but which one is preffered?


I don't have a style preference but decided to change to your
suggestion. New version attached.



Again, the patch compiles and works as expected.


--
José Arthur Benetasso Villanova

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Dean Rasheed

On Sat, 16 Mar 2019 at 23:44, Tomas Vondra  wrote:
>
> > 21). For consistency with other bms_ functions, I think the name of
> > the Bitmapset argument for bms_member_index() should just be called
> > "a". Nitpicking, I'd also put bms_member_index() immediately after
> > bms_is_member() in the source, to match the header.
>
> I think I've already done the renames in the last patch I submitted (are
> you looking at an older version of the code, perhaps?). I've moved it
> right after bms_is_member - good idea.
>

Ah OK, I was on the 20190315 patch yesterday. I've just updated to the
20190317 patch.
It looks like you forgot to update the argument name in the header file though.

Regards,
Dean

Re: Keyword table constness in jsonpath scanner.

2019-03-17 Thread Alexander Korotkov

On Sat, Mar 16, 2019 at 10:47 PM Mark G  wrote:
> While looking over the new jsonpath stuff I noticed the keyword table
> wasn't declared const. Shouldn't the table and the actual keyword
> strings both be declared const? Perhaps something like the attached
> (untested) patch.

Looks good to me.  Pushed, thanks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 10:01:20AM +0100, Fabien COELHO wrote:
> The implementation basically removes a lot of copy paste and calls the new
> update_controlfile function instead. I like removing useless code:-)

Yes, I spent something like 10 minutes looking at that code yesterday
and I agree that removing the control file to recreate it is not
really necessary, even if the window between its removal and
recreation is short.

> I do not see the value of *not* fsyncing the control file when writing it,
> as it is by definition very precious, so I added a fsync. The server side
> branch uses the backend available "pg_fsync", which complies with server
> settings there and can do nothing if fsync is disabled.

The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .

> Maybe the two changes could be committed separately.

I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal.  Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.  So while your patch is
on a good track, I would suggest to do the following things to
complete it:
- Add an extra argument bits16 to update_controlfile to pass a set of
optional flags, with NOSYNC being the only and current value.  The
default is to flush the file.
- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.
- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.  The version in
xlog.c uses BasicOpenFile(), so we should use also that in
update_controlfile() instead of OpenTransientFile().  As any errors
result in a PANIC we don't care about leaking fds.
--
Michael


signature.asc
Description: PGP signature

Re: CREATE OR REPLACE AGGREGATE?

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 07:35:16AM +, Andrew Gierth wrote:
> I took a bash at actually writing it and didn't see any obvious problems
> (I'll post the patch in a bit). Is there some reason (other than
> shortage of round tuits) why this might not be a good idea, or why it
> hasn't been done before?

Indeed.  There is not much on the matter in pgsql-hackers as far as I
can see, except that but the thread is short:
https://www.postgresql.org/message-id/CAGYyBgj3u_4mfTNPMnpOM2NPtWQVPU4WRsYz=rlcf59g-kg...@mail.gmail.com
--
Michael


signature.asc
Description: PGP signature

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Michael Paquier

On Sat, Mar 16, 2019 at 11:18:17AM +0100, Magnus Hagander wrote:
> BLCKSZ is very much an internal term. The exposed name through pg_settings
> is block_size, so I think the original was better. Combining that one with
> yours into  "initialized with block size %d" etc, makes it a lot nicer.

Yes, what Fabien and you say here makes sense.

> The "incompatible with pg_checksums" part may be a bit redundant with the
> commandname at the start as well, as I now realized Fabien pointed out
> downthread. But I would suggest just cutting it and saying "%s: database
> files are incompatible" or maybe "%s: data directory is incompatible" even?

"Cluster" is more consistent with the surroundings.  So what about the
attached then?
--
Michael
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5d4083fa9f..b7ebc11017 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -327,6 +327,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (ControlFile->blcksz != BLCKSZ)
+	{
+		fprintf(stderr, _("%s: database cluster is not compatible.\n"),
+progname);
+		fprintf(stderr, _("The database cluster was initialized with block size %u, but pg_checksums was compiled with block size %u.\n"),
+ControlFile->blcksz, BLCKSZ);
+		exit(1);
+	}
+
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{


signature.asc
Description: PGP signature

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Fabien COELHO





I'm not sure that prefixing the two lines with the comment line is very
elegant, I'd suggest to put spaces, and would still try to shorten the
second sentence, maybe:


I suggest to keep two lines, and only prefix the first one.


As you feel. For me the shorter the better, though, if the information is 
clear and all there.


--
Fabien.

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO



Bonjour Michaël-san,


Yes, that would be nice, for now I have focused.  For pg_resetwal yes
we could do it easily.  Would you like to send a patch?


Here is a proposal for "pg_resetwal".

The implementation basically removes a lot of copy paste and calls the 
new update_controlfile function instead. I like removing useless code:-)


The reserwal implementation was doing a rm/create cycle, which was leaving 
a small window for losing the controlfile. Not neat.


I do not see the value of *not* fsyncing the control file when writing it, 
as it is by definition very precious, so I added a fsync. The server side 
branch uses the backend available "pg_fsync", which complies with server 
settings there and can do nothing if fsync is disabled.


Maybe the two changes could be committed separately.

--
Fabien.diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2af8713216..dd085e16ab 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -918,18 +918,6 @@ PrintNewControlValues(void)
 static void
 RewriteControlFile(void)
 {
-	int			fd;
-	char		buffer[PG_CONTROL_FILE_SIZE];	/* need not be aligned */
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-	 "pg_control is too large for atomic disk writes");
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
-	 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
-
 	/*
 	 * Adjust fields as needed to force an empty XLOG starting at
 	 * newXlogSegNo.
@@ -961,53 +949,7 @@ RewriteControlFile(void)
 	ControlFile.max_prepared_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
-	/* Contents are protected with a CRC */
-	INIT_CRC32C(ControlFile.crc);
-	COMP_CRC32C(ControlFile.crc,
-(char *) ,
-offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile.crc);
-
-	/*
-	 * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
-	 * the excess over sizeof(ControlFileData).  This reduces the odds of
-	 * premature-EOF errors when reading pg_control.  We'll still fail when we
-	 * check the contents of the file, but hopefully with a more specific
-	 * error than "couldn't read pg_control".
-	 */
-	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
-	memcpy(buffer, , sizeof(ControlFileData));
-
-	unlink(XLOG_CONTROL_FILE);
-
-	fd = open(XLOG_CONTROL_FILE,
-			  O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
-			  pg_file_create_mode);
-	if (fd < 0)
-	{
-		fprintf(stderr, _("%s: could not create pg_control file: %s\n"),
-progname, strerror(errno));
-		exit(1);
-	}
-
-	errno = 0;
-	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		fprintf(stderr, _("%s: could not write pg_control file: %s\n"),
-progname, strerror(errno));
-		exit(1);
-	}
-
-	if (fsync(fd) != 0)
-	{
-		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
-		exit(1);
-	}
-
-	close(fd);
+	update_controlfile(XLOG_CONTROL_FILE, progname, );
 }
 
 
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 71e67a2eda..78ce8b020f 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -144,9 +144,9 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  * update_controlfile()
  *
  * Update controlfile values with the contents given by caller.  The
- * contents to write are included in "ControlFile".  Note that it is up
- * to the caller to fsync the updated file, and to properly lock
- * ControlFileLock when calling this routine in the backend.
+ * contents to write are included in "ControlFile". Not that it is
+ * to the caller to properly lock ControlFileLock when calling this
+ * routine in the backend.
  */
 void
 update_controlfile(const char *DataDir, const char *progname,
@@ -216,6 +216,21 @@ update_controlfile(const char *DataDir, const char *progname,
 #endif
 	}
 
+#ifndef FRONTEND
+	if (pg_fsync(fd) != 0)
+		ereport(PANIC,
+(errcode_for_file_access(),
+ errmsg("could not fsync file \"%s\": %m",
+		ControlFilePath)));
+#else
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+#endif
+
 #ifndef FRONTEND
 	if (CloseTransientFile(fd))
 		ereport(PANIC,

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Michael Paquier

On Sun, Mar 17, 2019 at 09:17:02AM +0100, Fabien COELHO wrote:
> I'm not sure that prefixing the two lines with the comment line is very
> elegant, I'd suggest to put spaces, and would still try to shorten the
> second sentence, maybe:

I suggest to keep two lines, and only prefix the first one.
--
Michael


signature.asc
Description: PGP signature

Re: Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-17 Thread Fabien COELHO





Something like "%s: database folder is incompatible" for the first
line sounds kind of better per the feedback gathered.  And then on the
second line:
"The database cluster was initialized with block size %u, but
pg_checksums was compiled with block size %u."


Ok. "%s" progname instead of "pg_checksums", or just "the command"?

I'm not sure that prefixing the two lines with the comment line is very 
elegant, I'd suggest to put spaces, and would still try to shorten the 
second sentence, maybe:


%s: incompatible database cluster of block size %u, while the command
  is compiled for block size %u.


I think it would be better to adapt the checksum computation, but this is
indeed non trivial because of the way the BLCKSZ constant is hardwired into
type declarations.


That's actually the possibility I was pointing out upthread.


Yes, I was expressing my agreement.


I am not sure that the use cases are worth the effort though.


Indeed, not for "pg_checksums" only.

A few years I considered to have an dynamic initdb-set block size, but 
BLCKSZ is used a lot as a constant for struct declaration and to compute 
other constants, so that was a lot of changes. I think it would be worth 
the effort because the current page size is suboptimal especially on SSD 
where 4 KiB would provide over 10% better performance for OLTP load. 
However, having to recompile to change it is a pain and not very package 
friendly.


--
Fabien.

Re: jsonpath

2019-03-17 Thread Alexander Korotkov

On Sat, Mar 16, 2019 at 9:12 PM Tom Lane  wrote:
> I wrote:
> > Good point.  I also see jsonpath_gram.h left behind after maintainer-clean:
>
> Oh, and of potentially more significance: after maintainer-clean and
> re-configure, make fails with
>
> In file included from jsonpath_gram.y:24:
> ../../../../src/include/utils/jsonpath_scanner.h:25:33: error: 
> utils/jsonpath_gram.h: No such file or directory
>
> I first thought this was a problem with insufficient dependencies
> allowing parallel make to do things in the wrong order, but the problem
> repeats even without any parallelism.  It looks like the dependencies
> have been constructed in such a way that if the symlink at
> src/include/utils/jsonpath_gram.h exists but the underlying file
> does not, nothing will make the underlying file.  This is pretty broken;
> aside from this outright failure, it also suggests that nothing will
> update that file if it exists but is out of date relative to its
> sources.
>
> Please make sure that the make rules associated with these files look
> exactly like the previously-debugged rules for existing bison/flex
> output files.  There are generally good reasons for every last bit
> of weirdness in those.

I've pushed a fix.  I hope I didn't forget anything.

BTW, it appears that windows build scripts also needs some fixup.  I'm
not very familiar with that.  I would be glad if somebody review the
attached patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


jsonpath_windows_build_fixes.patch
Description: Binary data

CREATE OR REPLACE AGGREGATE?

2019-03-17 Thread Andrew Gierth

So some PostGIS people were griping (on irc) about how the lack of
CREATE OR REPLACE AGGREGATE made their life difficult for updates. It
struck me that aggregates have acquired a relatively large number of new
attributes in recent years, almost all of which are applicable at
execution time rather than in parse analysis, so having a CREATE OR
REPLACE option seems like a no-brainer.

I took a bash at actually writing it and didn't see any obvious problems
(I'll post the patch in a bit). Is there some reason (other than
shortage of round tuits) why this might not be a good idea, or why it
hasn't been done before?

-- 
Andrew (irc:RhodiumToad)

Re: jsonpath

2019-03-17 Thread Alexander Korotkov

On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule  wrote:
> so 16. 3. 2019 v 10:36 odesílatel Alexander Korotkov 
>  napsal:
>>
>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov
>>  wrote:
>> > On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov
>> >  wrote:
>> > > On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov  
>> > > wrote:
>> > > > Attached 36th version of the patches.
>> > >
>> > > Thank yo for the revision!
>> > >
>> > > In the attached revision following changes are made:
>> > >
>> > > > "unknown" refers here to ordinary three-valued logical Unknown, which 
>> > > > is
>> > > > represented in SQL by NULL.
>> > > >
>> > > > JSON path expressions return sequences of SQL/JSON items, which are 
>> > > > defined by
>> > > > SQL/JSON data model.  But JSON path predicates (logical expressions), 
>> > > > which are
>> > > > used in filters, return three-valued logical values: False, True, or 
>> > > > Unknown.
>> > >
>> > >  * I've added short explanation of this to the documentation.
>> > >  * Removed no longer present data structures from typedefs.list of the
>> > > first patch.
>> > >  * Moved GIN support patch to number 3.  Seems to be well-isolated and
>> > > not very complex patch.  I propose to consider this to 12 too.  I
>> > > added high-level comment there, commit message and made some code
>> > > beautification.
>> >
>> > I think patches 1 and 2 are in committable shape (I reached Tomas
>> > off-list, he doesn't have more notes regarding them).  While patch 3
>> > requires more review.
>> >
>> > I'm going to push 1 and 2 if no objections.
>>
>> So, pushed.  Many thanks to reviewers and authors!
>>
>> Remaining part I'm proposing for 12 is attached.  I appreciate review of it.
>
>
> I tested this patch and I didn't  find any issue - just I tested basic 
> functionality and regress tests.
>
> looks well

Thank you for your feedback!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

74 matches

Mail list logo