[HACKERS] Replacing lfirst() with lfirst_node() appropriately in planner.c

2017-07-12 Thread Ashutosh Bapat
Hi,

Happened to stumble across some instances of lfirst() which could use
lfirst_node() in planner.c. Here's patch which replaces calls to
lfirst() extracting node pointers by lfirst_node() in planner.c. I
have covered all the occurences of lfirst() except
1. those extract generic pointers like Path, Plan etc.
2. those extract any node as Aggref, Var and then check using IsA()
3. those extracting some pointers other than nodes.

"make check" runs without any failure.

Are we carrying out such replacements in master or this will be
considered for v11?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_lfirst_node_planner_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-12 Thread Amit Langote
On 2017/07/12 23:58, Alvaro Herrera wrote:
> Amit Langote wrote:
>> On 2017/07/11 13:34, Alvaro Herrera wrote:
>>> Robert Haas wrote:
 On Mon, Jul 10, 2017 at 2:15 AM, Amit Langote
  wrote:
>>>
> Actually, if \d had shown RELKIND_PARTITIONED_TABLE tables as of Type
> "partitioned table", we wouldn't need a separate flag for marking a table
> as having partitions.

 I think that is false.  Whether something is partitioned and whether
 it is a partition are independent concerns.
>>>
>>> Maybe this discussion is easier if we differentiate "list tables" (\dt,
>>> or \d without a pattern) from "describe table" (\d with a name pattern).
>>
>> I think this discussion has mostly focused on "list tables" so far.
> 
> Yes, which I think is a mistake, because for some things you definitely
> need a list of partitions of the table in question.  And "describe
> table" can fulfill that role perfectly well, ISTM.

For a partitioned table, "describe table" (aka \d name_pattern) lists its
partitions showing for each partition its name and the partition bound.
"list tables/view/indexes/..." (aka \d[tvi...]) shows information about
the listed objects that one might want to see for partitions (such as the
schema, owner, size, description) and "describe table" doesn't provide
that about partitions as just mentioned.  So, it should be possible to
list partitions in some way.

>>> However, the "list tables"
>>> command \dt should definitely IMO not list partitions.
>>
>> Do you mean never?  Even if a modifier is specified?  In the patch I
>> proposed, \d! (or \d+ or \d++, if '!' turns out to be unpopular) will list
>> partitions, but \d or \dt won't.  That is, partitions are hidden by default.
> 
> I don't think there is any need for a single list of all partition of
> all tables -- is there?  I can't think of anything, but then I haven't
> been exposed very much to this feature yet.  For now, I lean towards "never".
> 
> (A different consideration is the use case of listing relation
> relfrozenxid/relminmxid ages, but that use case is already not fulfilled
> by psql metacommands so you still need custom catalog queries).

As I mentioned above, if we decide to hide partitions except when
"describing" the parent table, one would need custom queries even to see
schema, owner, etc. for partitions.

> I don't think \d! works terribly well as a mental model, but maybe
> that's just me.

It seems you're not alone.  Anyway, I'm starting to like Dean's advice [1]
on this matter.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/caezatcwcfftsbkycvyquzoosxkikqjpi_gdjz_vl6rcx8il...@mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-12 Thread Masahiko Sawada
On Wed, Jul 12, 2017 at 5:06 PM, Michael Paquier
 wrote:
> On Tue, Jul 11, 2017 at 9:28 AM, Masahiko Sawada  
> wrote:
>> Attached updated version patch. Please review it.
>
> Cool, thanks.

Thank you for reviewing the patch.

>
> +useless. If the second parameter wait_for_archive is true 
> and
> +the backup is taken on a standby, pg_stop_backup waits for 
> WAL
> +to archived when archive_mode is always.  
> Enforcing
> +manually a WAL segment swtich to happen with for example
> 1) "waits for WAL to BE archived".
> 2) s/swtich/switch
>
> +to false will control the wait time, thought all the
> WAL segments
> s/thought/though/
>
> /*
>  * During recovery, since we don't use the end-of-backup WAL
>  * record and don't write the backup history file, the
>  * starting WAL location doesn't need to be unique.
> This means
>  * that two base backups started at the same time
> might use
>  * the same checkpoint as starting locations.
>  */
> This comment in xlog.c needs an update. Two base backups started at
> the same can generate a backup history file with the same offset, aka
> same file name. I don't think that it matters much for this file
> honestly. I think that this would become meaningful once such files
> play a more important role, in which case having a unique identifier
> would be way more interesting, but this day has IMO not come yet.
> Other thoughts are welcome.
>
> if (waitforarchive && XLogArchivingActive())
> {
> +   /* Don't wait if WAL archiving not enabled always in recovery */
> +   if (backup_started_in_recovery && !XLogArchivingAlways())
> +   return stoppoint;
> +
> This has the smell of breakage waiting to happen, I think that we
> should have just one single return point, which updates as well the
> stop TLI at the end of the routine. This can just be a single
> condition.
>
> +   if (stoptli_p)
> +   *stoptli_p = stoptli;
> +
> Not sure there is any point to move that around, on top of previous comment.
>
> +errhint("Backup has been taken from a
> standby, check if the WAL segment "
> +"needed for this backup have been
> completed, in which case a "
> +"foreced segment switch may can
> be needed on the primary. "
> +"If a segment swtich has already
> happend check that your "
> +"archive_command is executing properly."
> +"pg_stop_backup can be canceled
> safely, but the database backup "
> +"will not be usable without all
> the WAL segments.")));
> Some comments here:
> 1) s/foreced/forced/
> 2) s/may can/may/
> 3) s/swtich/switch/
> 4) s/happend/happened/
> 5) "Backup has been taken from a standby" should be a single sentence.
>
> This is beginning to shape.
>

Sorry, I missed lots of typo in the last patch. All comments from you
are incorporated into the attached latest patch and I've checked it
whether there is other typos. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pg_stop_backup_on_standby_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in backend/storage/ipc/standby.c

2017-07-12 Thread Kyotaro HORIGUCHI
At Wed, 12 Jul 2017 17:11:12 +0300, Heikki Linnakangas  wrote 
in 
> On 07/11/2017 10:34 AM, Kyotaro HORIGUCHI wrote:
> > I noticed that a comment above StandbyAcquireAccessExclusiveLock
> > in backend/storage/ipc/standby.c using wrong names of a variable
> > and a type.
> >
> > The attached patch fixes it. The same mistake is found in older
> > versions back to 9.0.
> >
> Applied, thanks!

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Tom Lane
I wrote:
> I still think that this represents under-engineering by the ICU crew
> and not anything we're doing wrong.

After some more examination of their readme.html, I tried this:

$ ./runConfigureICU MacOSX --prefix=/usr/local/icu-57.1 --enable-rpath

and that gave me sane-looking libraries:

$ otool -L /usr/local/icu-57.1/lib/libicui18n.57.1.dylib 
/usr/local/icu-57.1/lib/libicui18n.57.1.dylib:
/usr/local/icu-57.1/lib/libicui18n.57.dylib (compatibility version 
57.0.0, current version 57.1.0)
/usr/local/icu-57.1/lib/libicuuc.57.dylib (compatibility version 
57.0.0, current version 57.1.0)
/usr/local/icu-57.1/lib/libicudata.57.dylib (compatibility version 
57.0.0, current version 57.1.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1238.60.2)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 
307.5.0)

--- note that now libicudata is shown as a dependency.  And then
configuring Postgres as I showed before leads to executables that
build, and initdb, and pass regression tests, and have a bunch
of ICU entries in pg_collation, without needing to fool with
DYLD_LIBRARY_PATH.

Their configure --help claims

  --enable-rpath  use rpath when linking default is only if necessary

but apparently that's for small values of "necessary" :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Sandeep Thakkar
On Thu, Jul 13, 2017 at 8:23 AM, Tom Lane  wrote:

> Sandeep Thakkar  writes:
> > On Thu, Jul 13, 2017 at 12:44 AM, Tom Lane  wrote:
> >> and this is evidently because the libraries themselves don't know where
> >> they live:
> >> $ otool -D /usr/local/icu-57.1/lib/libicui18n.57.dylib
> >> /usr/local/icu-57.1/lib/libicui18n.57.dylib:
> >> libicui18n.57.dylib
>
> > Right. I got that and I fixed the names and loader_paths for ICU libs and
> > postgres and that is why initdb in my case got going and didn't complain
> > about library not found.
>
> Uh, so what did you do *exactly*?
>
> I changed the id and the loader_paths for ICU libs and also changed the
install name of ICU libs on postgres binary
$ otool -L postgres  | grep icu
@loader_path/../lib/libicuuc.57.dylib (compatibility version 57.0.0,
current version 57.1.0)
@loader_path/../lib/libicui18n.57.dylib (compatibility version 57.0.0,
current version 57.1.0)


> >> I can make it work by setting DYLD_LIBRARY_PATH:
> >> $ DYLD_LIBRARY_PATH=/usr/local/icu-57.1/lib initdb
> >> ... goes through normally ...
>
> > You mean you are able to initialize cluster after this? Or you just
> > executed initdb and found that it doesn't complain about ICU lib
> location?
>
> initdb completed successfully.  I didn't try running any tests beyond
> that; I'm not sure that we have any regression tests that will exercise
> ICU locales.
>
> OK. I'll check at my end then.


> > As mentioned above instead of using DYLD_LIBRARY_PATH, I fixed the
> > loader_paths and am able to execute initdb and postgres binaries:
> > But, initdb -D data fails with error code "U_FILE_ACCESS_ERROR".
>
> Yeah, but notice that only two of the three interesting ICU libraries
> are actually linked into the postgres executable so far as otool and
> the dynamic linker are concerned.  I suspect that the other one,
> libicudata, is dynamically loaded by the ICU code --- and in your
> configuration it fails to find that library.  The error message is
> not definitive that that's what's happening, but it's suggestive.
> If that's the right interpretation, it means that setting
> DYLD_LIBRARY_PATH allows that third library to be found, but whatever
> you did doesn't.
>
> Yeah, I observed that otool -L lists only two of three ICU libs. But not
sure why it doesn't load the third one in my case. Here is the otool -D and
otool -L output on ICU libs:
$ otool -D ../lib/libicuuc.dylib
../lib/libicuuc.dylib:
libicuuc.57.dylib

$ otool -L ../lib/libicuuc.dylib
../lib/libicuuc.dylib:
libicuuc.57.dylib (compatibility version 57.0.0, current version 57.1.0)
@loader_path/../lib/libicudata.57.dylib (compatibility version 57.0.0,
current version 57.1.0)

$ otool -D ../lib/libicui18n.dylib
../lib/libicui18n.dylib:
libicui18n.57.dylib

$ otool -L ../lib/libicui18n.dylib
../lib/libicui18n.dylib:
libicui18n.57.dylib (compatibility version 57.0.0, current version 57.1.0)
@loader_path/../lib/libicuuc.57.dylib (compatibility version 57.0.0,
current version 57.1.0)
@loader_path/../lib/libicudata.57.dylib (compatibility version 57.0.0,
current version 57.1.0)

$ otool -D ../lib/libicudata.dylib
../lib/libicudata.dylib:
libicudata.57.dylib

$ otool -L ../lib/libicudata.dylib
../lib/libicudata.dylib:
libicudata.57.dylib (compatibility version 57.0.0, current version 57.1.0)


> I still think that this represents under-engineering by the ICU crew
> and not anything we're doing wrong.
>
> BTW, when I skimmed the "readme.html" docs in the ICU sources this
> morning, I noted that there were multiple ways you could configure
> it to find the ICU data.  I did not read in detail, figuring that
> their default configuration would be sane, but maybe that was an
> overly charitable assumption.
>
> Sure, thanks for checking this out. It seems the issue is with ICU and
something I did which is why the libicudata.dylib is not able to load. I'll
dig deeper. Thanks for your help.


> regards, tom lane
>



-- 
Sandeep Thakkar
EDB


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Craig Ringer
On 13 July 2017 at 10:58, Craig Ringer  wrote:

> On 12 July 2017 at 23:46, Jeroen Ooms  wrote:
>
>> On Wed, Jul 12, 2017 at 5:11 PM, Tom Lane  wrote:
>> > Jeroen Ooms  writes:
>> >> I maintain static libraries for libpq for the R programming language
>> >> (we need static linking to ship with the binary packages).
>> >
>> > How do you get that past vendor packaging policies?  When I worked at
>> > Red Hat, there was a very strong policy against allowing any package
>> > to statically embed parts of another one, because it creates serious
>> > management problems if e.g. the other one needs a security update.
>> > I'm sure Red Hat isn't the only distro that feels that way.
>>
>> We only use this on Windows. On platforms with a decent package
>> manager we indeed link to a shared library.
>
>
> You shouldn't ever need static libraries on Windows, though. Because it
> searches the CWD first on its linker search path
>

Er, sorry, binary location, not CWD.


>
> --
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Craig Ringer
On 12 July 2017 at 23:46, Jeroen Ooms  wrote:

> On Wed, Jul 12, 2017 at 5:11 PM, Tom Lane  wrote:
> > Jeroen Ooms  writes:
> >> I maintain static libraries for libpq for the R programming language
> >> (we need static linking to ship with the binary packages).
> >
> > How do you get that past vendor packaging policies?  When I worked at
> > Red Hat, there was a very strong policy against allowing any package
> > to statically embed parts of another one, because it creates serious
> > management problems if e.g. the other one needs a security update.
> > I'm sure Red Hat isn't the only distro that feels that way.
>
> We only use this on Windows. On platforms with a decent package
> manager we indeed link to a shared library.


You shouldn't ever need static libraries on Windows, though. Because it
searches the CWD first on its linker search path, you can just drop
libpq.dll in the same directory as your binary/library and link to the stub
libpq.lib .

On Mac OS X and Linux, you can use relative rpath linking. On OS X use
@executable_path either at link-time or later via install_name_tool . On
Linux, use $ORIGIN in your rpath. Beware of quoting issues with $ORIGIN
though.

I'm not trying to block support for a static libpq, I just don't see the
point.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Tom Lane
Sandeep Thakkar  writes:
> On Thu, Jul 13, 2017 at 12:44 AM, Tom Lane  wrote:
>> and this is evidently because the libraries themselves don't know where
>> they live:
>> $ otool -D /usr/local/icu-57.1/lib/libicui18n.57.dylib
>> /usr/local/icu-57.1/lib/libicui18n.57.dylib:
>> libicui18n.57.dylib

> Right. I got that and I fixed the names and loader_paths for ICU libs and
> postgres and that is why initdb in my case got going and didn't complain
> about library not found.

Uh, so what did you do *exactly*?

>> I can make it work by setting DYLD_LIBRARY_PATH:
>> $ DYLD_LIBRARY_PATH=/usr/local/icu-57.1/lib initdb
>> ... goes through normally ...

> You mean you are able to initialize cluster after this? Or you just
> executed initdb and found that it doesn't complain about ICU lib location?

initdb completed successfully.  I didn't try running any tests beyond
that; I'm not sure that we have any regression tests that will exercise
ICU locales.

> As mentioned above instead of using DYLD_LIBRARY_PATH, I fixed the
> loader_paths and am able to execute initdb and postgres binaries:
> But, initdb -D data fails with error code "U_FILE_ACCESS_ERROR".

Yeah, but notice that only two of the three interesting ICU libraries
are actually linked into the postgres executable so far as otool and
the dynamic linker are concerned.  I suspect that the other one,
libicudata, is dynamically loaded by the ICU code --- and in your
configuration it fails to find that library.  The error message is
not definitive that that's what's happening, but it's suggestive.
If that's the right interpretation, it means that setting
DYLD_LIBRARY_PATH allows that third library to be found, but whatever
you did doesn't.

I still think that this represents under-engineering by the ICU crew
and not anything we're doing wrong.

BTW, when I skimmed the "readme.html" docs in the ICU sources this
morning, I noted that there were multiple ways you could configure
it to find the ICU data.  I did not read in detail, figuring that
their default configuration would be sane, but maybe that was an
overly charitable assumption.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-07-12 Thread Chapman Flack
On 07/12/17 08:38, Robert Haas wrote:

> another protocol message.  I feel like the usefulness of this for
> connection pooling software is pretty obvious: it's a lot easier for
> the pooler to disallow a certain protocol message than a certain SQL
> command.

I assume you mean easier than disallowing an SQL command that has to be
disallowed (with all the complexity of of parsing and recognizing all the
forms it could take) or else the client could abuse it—in other words,
the current state of affairs, without a cookie'd SQL command.

Once the comparison is not to the current state, but to a proposed
cookie mechanism for the SQL command, I don't think I see either idea
as strikingly easier or more effective.

But the protocol extension becomes another thing (like SASL channel
binding) that you can add to the server, but you don't really have it
until all of the protocol driver projects catch up. A mechanism in SQL
is ready for everybody as soon as it's there.

> variable can't later be changed from SQL.  So now you don't even need
> the cookie, and the client can't try to guess the cookie.

Again, the trouble of needing a cookie or of supporting a special protocol
message don't seem that different to me, and with, say, a 64-bit cookie
built from a good source of randomness, the risk of a client guessing it
seems negligible.

One other reason I think I'm slow to warm to a protocol extension is
things done that way tend to make second-class citizens of code that
runs in the backend.

For an example, think of how warnings are handled. If client code uses
JDBC, it should be able to call getWarnings() on a ResultSet and find out
what warnings might have been raised. If you move the same code to the
backend, in PL/Java, it still uses JDBC and there's still a getWarnings()
method but it's (currently) useless. elog makes catching errors easy,
but warnings get shipped straight to libpq and out to the client. (For
this example, that's also an encapsulation breach and possible information
leak, the client getting warnings from internal SQL queries by backend
routines).

LISTEN/NOTIFY is another example of a mechanism that's not there for
backend code, because the notification mechanism is purely a message
over pq. A less interesting issue than the warnings, perhaps (once the
code is in the backend, why be notified by a trigger when it could simply
BE a trigger?) ... but it could be a bit surprising to someone accustomed
to having it available for client code.

So, even if I don't this instant have an example of why some backend
code or extension in an arbitrary PL might want to be able to lock down
a particular GUC, I can imagine it might happen, and if the mechanism is
just SQL with a cookie, all the PLs have it for free, but if it is
tied up with the fe/be protocol, it's hard.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Sandeep Thakkar
On Thu, Jul 13, 2017 at 12:44 AM, Tom Lane  wrote:

> Sandeep Thakkar  writes:
> > On Wed, Jul 12, 2017 at 8:13 PM, Tom Lane  wrote:
> >> Ugh.  Please provide details about the hand-built ICU --- what version,
> >> what configure options did you use for it, etc.
>
> > I tried with ICU 53.1 and 57.1 and the results is same.
> > There was no configure options other than --prefix.
>
> Well, when I try that on macOS, I get an ICU installation that's basically
> nonfunctional, apparently because they forgot to specify -install_name
> while linking the dylibs.  The symptom looks a bit different:
>
> $ initdb
> dyld: Library not loaded: libicui18n.57.dylib
>   Referenced from: /Users/tgl/testversion/bin/postgres
>   Reason: image not found
>
> but it's clear from otool that the dynamic linker doesn't know where
> to find the ICU libraries:
>
> $ otool -L ~/testversion/bin/postgres
> /Users/tgl/testversion/bin/postgres:
> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
> version 1238.60.2)
> libicui18n.57.dylib (compatibility version 57.0.0, current version
> 57.1.0)
> libicuuc.57.dylib (compatibility version 57.0.0, current version
> 57.1.0)
>
> and this is evidently because the libraries themselves don't know where
> they live:
>
> $ otool -D /usr/local/icu-57.1/lib/libicui18n.57.dylib
> /usr/local/icu-57.1/lib/libicui18n.57.dylib:
> libicui18n.57.dylib
>
> Right. I got that and I fixed the names and loader_paths for ICU libs and
postgres and that is why initdb in my case got going and didn't complain
about library not found.


> Compare to what you get for a properly built dylib:
>
> $ otool -D /usr/local/ssl-1.1.0e/lib/libssl.1.1.dylib
> /usr/local/ssl-1.1.0e/lib/libssl.1.1.dylib:
> /usr/local/ssl-1.1.0e/lib/libssl.1.1.dylib
>
> I can make it work by setting DYLD_LIBRARY_PATH:
>
> $ DYLD_LIBRARY_PATH=/usr/local/icu-57.1/lib initdb
> ... goes through normally ...
>
> You mean you are able to initialize cluster after this? Or you just
executed initdb and found that it doesn't complain about ICU lib location?
As mentioned above instead of using DYLD_LIBRARY_PATH, I fixed the
loader_paths and am able to execute initdb and postgres binaries:
$ ./initdb -V
initdb (PostgreSQL) 10beta2

$ ./postgres -V
postgres (PostgreSQL) 10beta2

But, initdb -D data fails with error code "U_FILE_ACCESS_ERROR".


> but that hardly qualifies as a production-grade solution.
>
> I've not dug into the ICU sources to see what it would take to get the
> correct link option applied.  But the short answer seems to be that ICU
> is not up to speed for installation in non-default locations on macOS.
>
> For the record's sake, I configured icu4c-57_1-src.tgz like this:
>
> $ ./runConfigureICU MacOSX --prefix=/usr/local/icu-57.1
>
> and since there's no pkg-config on this machine, I did this to
> configure Postgres:
>
> $ ICU_CFLAGS=" " ICU_LIBS="-licui18n -licuuc -licudata" ./configure [usual
> options] --with-icu --with-includes=/usr/local/icu-57.1/include
> --with-libraries=/usr/local/icu-57.1/lib
>
> It's possible that if I did have pkg-config installed, it would have
> produced some different value for ICU_LIBS, but I rather doubt it.
> I'm not sure that it's possible to fix this on the consumer executable's
> link line anyway.
>
> yes, we use the same flags during configure.


> regards, tom lane
>



-- 
Sandeep Thakkar
EDB


Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-12 Thread Amit Kapila
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes  wrote:
> On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila 
> wrote:
>>
>> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes  wrote:
>> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar 
>> > wrote:
>> >>
>> >> So because of this high projection cost the seqpath and parallel path
>> >> both have fuzzily same cost but seqpath is winning because it's
>> >> parallel safe.
>> >
>> >
>> > I think you are correct.  However, unless parallel_tuple_cost is set
>> > very
>> > low, apply_projection_to_path never gets called with the Gather path as
>> > an
>> > argument.  It gets ruled out at some earlier stage, presumably because
>> > it
>> > assumes the projection step cannot make it win if it is already behind
>> > by
>> > enough.
>> >
>>
>> I think that is genuine because tuple communication cost is very high.
>
>
> Sorry, I don't know which you think is genuine, the early pruning or my
> complaint about the early pruning.
>

Early pruning.  See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins.  Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

> I agree that the communication cost is high, which is why I don't want to
> have to set parellel_tuple_cost very low.  For example, to get the benefit
> of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
> real-world case, not the dummy test case I posted, although the number are
> around the same for that one too).  But with a setting that low, all kinds
> of other things also start using parallel plans, even if they don't benefit
> from them and are harmed.
>
> I realize we need to do some aggressive pruning to avoid an exponential
> explosion in planning time, but in this case it has some rather unfortunate
> consequences.  I wanted to explore it, but I can't figure out where this
> particular pruning is taking place.
>
> By the time we get to planner.c line 1787, current_rel->pathlist already
> does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
> pruning is happening earlier than that.
>

Check generate_gather_paths.

>
>>
>> If your table is reasonable large then you might want to try by
>> increasing parallel workers (Alter Table ... Set (parallel_workers =
>> ..))
>
>
>
> Setting parallel_workers to 8 changes the threshold for the parallel to even
> be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it is
> going in the correct direction, but not by enough to matter.
>

You might want to play with cpu_tuple_cost and or seq_page_cost.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-12 Thread Dean Rasheed
On 12 July 2017 at 23:23, Dean Rasheed  wrote:
> I also agree that there probably isn't much need for a list that
> *only* includes partitions, but if someone comes up with a convincing
> use case, then we could add another 2-letter command for that.
>

Actually, I just thought of a round-about sort of use case:

The various 2-letter commands \dE, \dt, etc... are designed to work
together, so you can do things like \dEt or \dtE to list all local and
foreign tables, whilst excluding views, sequences, etc. So, if for the
sake of argument, \dP were made to list partitions, then you'd be able
to do things like \dEPt to list all the various kinds of tables,
including partitions, whilst excluding views, etc.

That seems somewhat more elegant and flexible than \d++ or \d! or whatever.

Of course, you'd have to decide whether a foreign partition came under
\dE, \dP, both or something else. I'm not sure that we should eat
another letter of the alphabet just for that case, because there
aren't many left, and I don't think any will be natural mnemonics like
the others.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Andres Freund
On 2017-07-12 23:55:56 +0100, Greg Stark wrote:
> Fwiw I think the real problem is that building static libraries
> "properly" requires different compiler options -- notably they're not
> normally built with -fPIC. So that means building every object twice
> which kind of breaks make's build model which has a simple dependency
> graph where each object appears once. Some packages do this by
> inventing a foo-shared.o and foo-static.o but that introduces its own
> weirdness.
> 
> I don't know what the downsides would be of creating a static library
> out of objects built with -fPIC. It might just be a small performance
> penalty which might be no big deal for libpq. That may be a good
> compromise.

FWIW, most linux distributions build everything with -fPIC/PIE anyway
these days, to allow address space randomization. So I don't think this
is a huge concern for modern platforms.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Greg Stark
On 12 July 2017 at 16:11, Tom Lane  wrote:
> Jeroen Ooms  writes:
>
>> This works but it's a bit of a pain to maintain. I was wondering if
>> this hack could be merged so that the standard 'configure
>> --enable-static' script would install a static library for libpq
>> alongside the shared one.
>
> FWIW, we used to have support for building static libpq, but
> we got rid of it a long time ago.  I couldn't find the exact
> spot in some desultory trawling of the commit history.

Fwiw I think the real problem is that building static libraries
"properly" requires different compiler options -- notably they're not
normally built with -fPIC. So that means building every object twice
which kind of breaks make's build model which has a simple dependency
graph where each object appears once. Some packages do this by
inventing a foo-shared.o and foo-static.o but that introduces its own
weirdness.

I don't know what the downsides would be of creating a static library
out of objects built with -fPIC. It might just be a small performance
penalty which might be no big deal for libpq. That may be a good
compromise.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-07-12 Thread Alvaro Herrera
Petr Jelinek wrote:

> So best idea I could come up with is to make use of the new condition
> variable API. That lets us wait for variable which can be set per slot.

Here's a cleaned up version of that patch, which I intend to get in the
tree tomorrow.  I verified that this allows the subscription tests to
pass with Tom's sleep additions.

I noticed one point where we're reading the active_pid without locking;
marked it with an XXX comment.  Not yet sure whether this is a bug or
not.


I noticed something funny in CREATE_REPLICATION; apparently we first
create a slot and set it active, then we activate it by name.  With the
current coding it seems to work fine, because we're careful to make
activation idempotent, but it looks convoluted.  I don't think this is
new, though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 55533aa3cdc2fecbf7b6b6c661649342a204e73b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 12 Jul 2017 18:38:33 -0400
Subject: [PATCH v3 1/1] Wait for slot to become free in when dropping it

---
 src/backend/replication/logical/logicalfuncs.c |  2 +-
 src/backend/replication/slot.c | 79 +++---
 src/backend/replication/slotfuncs.c|  2 +-
 src/backend/replication/walsender.c|  6 +-
 src/include/replication/slot.h |  8 ++-
 5 files changed, 69 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/logical/logicalfuncs.c 
b/src/backend/replication/logical/logicalfuncs.c
index 363ca82cb0..a3ba2b1266 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -244,7 +244,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, 
bool confirm, bool bin
else
end_of_wal = GetXLogReplayRecPtr();
 
-   ReplicationSlotAcquire(NameStr(*name));
+   ReplicationSlotAcquire(NameStr(*name), true);
 
PG_TRY();
{
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20e11..76198a627d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -157,6 +157,7 @@ ReplicationSlotsShmemInit(void)
/* everything else is zeroed by the memset above */
SpinLockInit(>mutex);
LWLockInitialize(>io_in_progress_lock, 
LWTRANCHE_REPLICATION_SLOT_IO_IN_PROGRESS);
+   ConditionVariableInit(>active_cv);
}
}
 }
@@ -313,24 +314,27 @@ ReplicationSlotCreate(const char *name, bool db_specific,
LWLockRelease(ReplicationSlotControlLock);
 
/*
-* Now that the slot has been marked as in_use and in_active, it's safe 
to
+* Now that the slot has been marked as in_use and active, it's safe to
 * let somebody else try to allocate a slot.
 */
LWLockRelease(ReplicationSlotAllocationLock);
+
+   ConditionVariableBroadcast(>active_cv);
 }
 
 /*
  * Find a previously created slot and mark it as used by this backend.
  */
 void
-ReplicationSlotAcquire(const char *name)
+ReplicationSlotAcquire(const char *name, bool nowait)
 {
ReplicationSlot *slot = NULL;
+   int active_pid = 0;
int i;
-   int active_pid = 0; /* Keep compiler quiet */
 
Assert(MyReplicationSlot == NULL);
 
+retry:
/* Search for the named slot and mark it active if we find it. */
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
for (i = 0; i < max_replication_slots; i++)
@@ -339,27 +343,52 @@ ReplicationSlotAcquire(const char *name)
 
if (s->in_use && strcmp(name, NameStr(s->data.name)) == 0)
{
+   /* Found the slot we want -- can we activate it? */
SpinLockAcquire(>mutex);
+
active_pid = s->active_pid;
if (active_pid == 0)
active_pid = s->active_pid = MyProcPid;
+
SpinLockRelease(>mutex);
slot = s;
+
break;
}
}
LWLockRelease(ReplicationSlotControlLock);
 
-   /* If we did not find the slot or it was already active, error out. */
+   /* If we did not find the slot, error out. */
if (slot == NULL)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg("replication slot \"%s\" does not 
exist", name)));
+
+   /*
+* If we found the slot but it's already active in another backend, we
+* either error out or retry after a short wait, as caller specified.
+*/
if (active_pid != MyProcPid)
-   ereport(ERROR,
- 

Re: [HACKERS] New partitioning - some feedback

2017-07-12 Thread Dean Rasheed
On 12 July 2017 at 15:58, Alvaro Herrera  wrote:
> Amit Langote wrote:
>> On 2017/07/11 13:34, Alvaro Herrera wrote:
>> > However, the "list tables"
>> > command \dt should definitely IMO not list partitions.
>>
>> Do you mean never?  Even if a modifier is specified?  In the patch I
>> proposed, \d! (or \d+ or \d++, if '!' turns out to be unpopular) will list
>> partitions, but \d or \dt won't.  That is, partitions are hidden by default.
>
> I don't think there is any need for a single list of all partition of
> all tables -- is there?  I can't think of anything, but then I haven't
> been exposed very much to this feature yet.  For now, I lean towards "never".
>

So just focusing on the listing issue for now...

I tend to agree with some of the upstream comments that a bare \d
should list everything, including partitions, because partitions are
still tables that you might want to do DML or DDL on.

Also, if you look at what we already have, \d lists all types of
relations, and then there are 2-letter commands \dE, \di, \dm, \ds,
\dt and \dv that list just specific kinds of relations, for example
\dE lists foreign tables, and \dt lists local tables, specifically
excluding foreign tables.

So ISTM that the most logical extension of that is:

  \d - list all relations, including partitions

  \dt - list only tables that are not foreign tables or partitions
of other tables

(that's not quite an exact extension of the existing logic, because of
course it's partitioned tables that have the different relkind, not
the partitions, but the above seems like the most useful behaviour)

With this, I don't think there's any need for any additional
modifiers, like ! or ++.

I also agree that there probably isn't much need for a list that
*only* includes partitions, but if someone comes up with a convincing
use case, then we could add another 2-letter command for that.


> I don't think \d! works terribly well as a mental model, but maybe
> that's just me.
>

Yeah, I agree. It just looks ugly somehow.


>> So it seems most of us are in favor for showing partitioned tables as
>> "partitioned table" instead of "table" in the table listing.
>
> Yeah, that sounds good to me.
>

+1

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Causal reads take II

2017-07-12 Thread Thomas Munro
On Thu, Jul 13, 2017 at 2:51 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Thank you. I started to play with it a little bit, since I think it's an
> interesting idea. And there are already few notes:

Thanks Dmitry.

> * I don't see a CF item for that, where is it?

https://commitfest.postgresql.org/14/951/

The latest version of the patch is here:

https://www.postgresql.org/message-id/CAEepm%3D0YigNQczAF-%3Dx_SxT6cJv77Yb0EO%2BcAFnqRyVu4%2BbKFw%40mail.gmail.com

I renamed it to "synchronous replay", because "causal reads" seemed a
bit too arcane.

> * Looks like there is a sort of sensitive typo in `postgresql.conf.sample`:
>
> ```
> +#causal_reads_standy_names = '*' # standby servers that can potentially
> become
> + # available for causal reads; '*' = all
> +
> ```
>
> it should be `causal_reads_standby_names`.

Fixed in latest version (while renaming).

> Also I hope in the nearest future
> I
> can provide a full review.

Great news, thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 2:17 PM, Peter Geoghegan  wrote:
> I'd be interested in seeing the difference it makes if Postgres is
> built with the call to _bt_check_unique() commented out within
> nbtinsert.c.

Actually, I mean that I wonder how much of a difference it would make
if this entire block was commented out within _bt_doinsert():

if (checkUnique != UNIQUE_CHECK_NO)
{
...
}


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 1:55 PM, Alvaro Herrera
 wrote:
> Not to mention work done with a "buffer cleanup lock" held -- which is
> compounded by the fact that acquiring such a lock is prone to starvation
> if there are many scanners of that index.  I've seen a case where a very
> hot table is scanned so heavily that vacuum is starved for days waiting
> to acquire cleanup on a single page (vacuum was only able to finish
> because the app using the table was restarted).  I'm sure that a uniform
> distribution of keys, with a uniform distribution of values scanned,
> would give a completely different behavior than a highly skewed
> distribution where a single key receives a large fraction of the scans.

Good point.

I'd be interested in seeing the difference it makes if Postgres is
built with the call to _bt_check_unique() commented out within
nbtinsert.c. The UPDATE query doesn't ever change indexed columns, so
there should be no real need for the checking it does in this case.
We're seeing a lot of duplicates in at least part of the keyspace, and
yet the queries themselves should be as HOT-safe as possible.

Another possibly weird thing that I noticed is latency standard
deviation. This is from Alik's response to my request to run that SQL
query:

latency average = 1.375 ms
tps = 93085.250384 (including connections establishing)
tps = 93125.724773 (excluding connections establishing)
SQL script 1: /home/nglukhov/ycsb_read_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2782999 transactions (49.8% of total, tps = 46364.447705)
 - latency average = 0.131 ms
 - latency stddev = 0.087 ms
SQL script 2: /home/nglukhov/ycsb_update_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2780197 transactions (49.8% of total, tps = 46317.766703)
 - latency average = 2.630 ms
 - latency stddev = 14.092 ms

This is from the 128 client case -- the slow case.

Notice that the standard deviation is very high for
ycsb_update_zipf.sql. I wonder if this degrades because of some kind
of feedback loop, that reaches a kind of tipping point at higher
client counts.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alvaro Herrera
Peter Geoghegan wrote:

> Now, that might not seem like that much of a difference, but if you
> consider how duplicates are handled in the B-Tree code, and how unique
> index enforcement works, I think it could be. It could lead to heavy
> buffer lock contention, because we sometimes do a lot of work with an
> exclusive buffer lock held.

Not to mention work done with a "buffer cleanup lock" held -- which is
compounded by the fact that acquiring such a lock is prone to starvation
if there are many scanners of that index.  I've seen a case where a very
hot table is scanned so heavily that vacuum is starved for days waiting
to acquire cleanup on a single page (vacuum was only able to finish
because the app using the table was restarted).  I'm sure that a uniform
distribution of keys, with a uniform distribution of values scanned,
would give a completely different behavior than a highly skewed
distribution where a single key receives a large fraction of the scans.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-07-12 Thread Jeevan Ladhe
Hi,

I have tried to address your comments in the V22 set of patches[1].
Please find my feedback inlined on your comments.

On Fri, Jun 16, 2017 at 1:46 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello, I'd like to review this but it doesn't fit the master, as
> Robert said. Especially the interface of predicate_implied_by is
> changed by the suggested commit.
>
> Anyway I have some comment on this patch with fresh eyes.  I
> believe the basic design so my comment below are from a rather
> micro viewpoint.
>
> - Considering on how canSkipPartConstraintValidation is called, I
>   *think* that RelationProvenValid() would be better.  (But this
>   would be disappear by rebasing..)
>
> I think RelationProvenValid() is bit confusing in the sense that, it does
not
imply the meaning that some constraint is being checke


> - 0002 changes the interface of get_qual_for_list, but left
>   get_qual_for_range alone.  Anyway get_qual_for_range will have
>   to do the similar thing soon.
>
> Yes, this will be taken care with default partition for range.


> - In check_new_partition_bound, "overlap" and "with" is
>   completely correlated with each other. "with > -1" means
>   "overlap = true". So overlap is not useless. ("with" would be
>   better to be "overlap_with" or somehting if we remove
>   "overlap")
>
> Agree, probably this can be taken as a separate refactoring patch.
Currently,
for in case of default I have got rid of "overlap", and now use of "with"
and
that is also used just for code simplification.


> - The error message of check_default_allows_bound is below.
>
>   "updated partition constraint for default partition \"%s\"
>would be violated by some row"
>
>   This looks an analog of validateCheckConstraint but as my
>   understanding this function is called only when new partition
>   is added. This would be difficult to recognize in the
>   situation.
>
>   "the default partition contains rows that should be in
>the new partition: \"%s\""
>
>   or something?
>
> I think the current error message is more clearer. Agree that there might
be
sort of confusion if it's due to addition or attach partition, but we have
already stretched the message longer. I am open to suggestions here.


> - In check_default_allows_bound, the iteration over partitions is
>   quite similar to what validateCheckConstraint does. Can we
>   somehow share validateCheckConstraint with this function?
>
> May be we can, but I think again this can also be categorized as
refactoring
patch and done later maybe. Your thoughts?


> - In the same function, skipping RELKIND_PARTITIONED_TABLE is
>   okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seem
>   good. I think at least some warning should be emitted.
>
>   "Skipping foreign tables in the defalut partition. It might
>contain rows that should be in the new partition."  (Needs
>preventing multple warnings in single call, maybe)
>
> Currently we do not emit any warning when attaching a foreign table as a
non-default partition having rows that do not match its partition
constraints
and we still let attach the partition.
But, I agree that we should emit such a warning, I added a code to do this.


> - In the same function, the following condition seems somewhat
>   strange in comparison to validateCheckConstraint.
>
> > if (partqualstate && ExecCheck(partqualstate, econtext))
>
>   partqualstate won't be null as long as partition_constraint is
>   valid. Anyway (I'm believing that) an invalid constraint
>   results in error by ExecPrepareExpr. Therefore 'if
>   (partqualstate' is useless.
>
> Removed the check for partqualstate.


> - In gram.y, the nonterminal for list spec clause is still
>   "ForValues". It seems somewhat strange. partition_spec or
>   something would be better.
>
> Done.
Thanks for catching this, I agree with you.
I have changed the name to PartitionBoundSpec.


> - This is not a part of this patch, but in ruleutils.c, the error
>   for unknown paritioning strategy is emitted as following.
>
> >   elog(ERROR, "unrecognized partition strategy: %d",
> >(int) strategy);
>
>   The cast is added because the strategy is a char. I suppose
>   this is because strategy can be an unprintable. I'd like to see
>   a comment if it is correct.
>
> I think this should be taken separately.

Thanks,
Jeevan Ladhe

Refs:
[1] https://www.postgresql.org/message-id/CAOgcT0OARciE2X%
2BU0rjSKp9VuC279dYcCGkc3nCWKhHQ1_m2rw%40mail.gmail.com


Re: [HACKERS] Adding support for Default partition in partitioning

2017-07-12 Thread Jeevan Ladhe
Hi Ashutosh,

I have tried to address your comments in the V22 set of patches[1].
Please find my feedback inlined on your comments.

On Thu, Jun 15, 2017 at 10:24 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Some more comments on the latest set of patches.
>
> In heap_drop_with_catalog(), we heap_open() the parent table to get the
> default partition OID, if any. If the relcache doesn't have an entry for
> the
> parent, this means that the entry will be created, only to be invalidated
> at
> the end of the function. If there is no default partition, this all is
> completely unnecessary. We should avoid heap_open() in this case. This also
> means that get_default_partition_oid() should not rely on the relcache
> entry,
> but should growl through pg_inherit to find the default partition.
>

Instead of reading the defaultOid from cache, as suggested by Amit here[2],
now
I have stored this in pg_partition_table, and reading it from there.


> In get_qual_for_list(), if the table has only default partition, it won't
> have
> any boundinfo. In such a case the default partition's constraint would
> come out
> as (NOT ((a IS NOT NULL) AND (a = ANY (ARRAY[]::integer[]. The empty
> array
> looks odd and may be we spend a few CPU cycles executing ANY on an empty
> array.
> We have the same problem with a partition containing only NULL value. So,
> may
> be this one is not that bad.
>
> Fixed.


> Please add a testcase to test addition of default partition as the first
> partition.
>
> Added this in insert.sql rather than create_table.sql, as the purpose here
is to test if default being a first partition accepts any values for the key
including null.


> get_qual_for_list() allocates the constant expressions corresponding to the
> datums in CacheMemoryContext while constructing constraints for a default
> partition. We do not do this for other partitions. We may not be
> constructing
> the constraints for saving in the cache. For example, ATExecAttachPartition
> constructs the constraints for validation. In such a case, this code will
> unnecessarily clobber the cache memory. generate_partition_qual() copies
> the
> partition constraint in the CacheMemoryContext.
>

Removed CacheMemoryContext.
I thought once the partition qual is generated, it should be in remain in
the memory context, but when it is needed, it is indirectly taken care by
generate_partition_qual() in following code:

/* Save a copy in the relcache */
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
rel->rd_partcheck = copyObject(result);
MemoryContextSwitchTo(oldcxt);


>
> +   if (spec->is_default)
> +   {
> +   result = list_make1(make_ands_explicit(result));
> +   result = list_make1(makeBoolExpr(NOT_EXPR, result, -1));
> +   }
>
> If the "result" is an OR expression, calling make_ands_explicit() on it
> would
> create AND(OR(...)) expression, with an unnecessary AND. We want to avoid
> that?
>
>
Actually the OR expression here is generated using a call to makeBoolExpr(),
which returns a single expression node, and if this is passed to
make_ands_explicit(), it checks if the list length is node, returns the
initial
node itself, and hence AND(OR(...)) kind of expression is not generated
here.


> +   if (cur_index < 0 && (partition_bound_has_default(
> partdesc->boundinfo)))
> +   cur_index = partdesc->boundinfo->default_index;
> +
> The partition_bound_has_default() check is unnecessary since we check for
> cur_index < 0 next anyway.
>
> Done.


> + *
> + * Given the parent relation checks if it has default partition, and if it
> + * does exist returns its oid, otherwise returns InvalidOid.
> + */
> May be reworded as "If the given relation has a default partition, this
> function returns the OID of the default partition. Otherwise it returns
> InvalidOid."
>
> I have reworded this to:
"If the given relation has a default partition return the OID of the default
partition, otherwise return InvalidOid."


> +Oid
> +get_default_partition_oid(Relation parent)
> +{
> +   PartitionDesc partdesc = RelationGetPartitionDesc(parent);
> +
> +   if (partdesc->boundinfo && partition_bound_has_default(
> partdesc->boundinfo))
> +   return partdesc->oids[partdesc->boundinfo->default_index];
> +
> +   return InvalidOid;
> +}
> An unpartitioned table would not have partdesc set set. So, this function
> will
> segfault if we pass an unpartitioned table. Either Assert that partdesc
> should
> exist or check for its NULL-ness.
>

Fixed.


>
>
> +defaultPartOid = get_default_partition_oid(rel);
> +if (OidIsValid(defaultPartOid))
> +ereport(ERROR,
> +(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> + errmsg("there exists a default partition for table
> \"%s\", cannot attach a new partition",
> +RelationGetRelationName(rel;
> +
> Should be done before heap_open on the table being attached. If we are not
> going to attach the partition, there's no 

Re: [HACKERS] Double shared memory allocation for SLRU LWLocks

2017-07-12 Thread Alexander Korotkov
On Wed, Jul 12, 2017 at 2:03 PM, Teodor Sigaev  wrote:

> It seems to me that we're allocating shared memory for SLRU lwlocks twice,
>> unless I'm missing something.
>>
> I think you are right.
>
> Did you check previous versions? i.e. should it be applyed to previous
> branches?? I suppose yes, to minimize code difference.
>

Problem was introduced by fe702a7b.  I think it would be good to backpatch
to 9.6.  Besides it doesn't cause any user-visible problem, nevertheless
it's a bug.


> Also I'd like an idea to add Assert(offset <= SimpleLruShmemSize(nslots,
> nlsns)) at the end of SimpleLruInit()


Good point.  Please, find it in attached patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-slru-lwlock-shmem-double-allocation-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-07-12 Thread Jeevan Ladhe
Hi Robert,

I have tried to address your comments in the V22 set of patches[1].
Please find my feedback inlined on your comments.


On Thu, Jun 15, 2017 at 1:21 AM, Robert Haas  wrote:
>
> - Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.
>

Rebased on master latest commit: ca793c59a51e94cedf8cbea5c29668bf8fa298f3


> - Still no documentation.
>
Yes, this is long pending, and I will make this is a priority to get it
included
in next set of my patches.

- Should probably be merged with the patch to add default partitioning
> for ranges.


Beena is already rebasing her patch on my latest patches, so I think getting
them merged here won't be an issue, mostly will be just like one more patch
on top my patches.


> Other stuff I noticed:
>
> - The regression tests don't seem to check that the scan-skipping
> logic works as expected.  We have regression tests for that case for
> attaching regular partitions, and it seems like it would be worth
> testing the default-partition case as well.
>

Added a test case for default in alter_table.sql.

- check_default_allows_bound() assumes that if
> canSkipPartConstraintValidation() fails for the default partition, it
> will also fail for every subpartition of the default partition.  That
> is, once we commit to scanning one child partition, we're committed to
> scanning them all.  In practice, that's probably not a huge
> limitation, but if it's not too much code, we could keep the top-level
> check but also check each partitioning individually as we reach it,
> and skip the scan for any individual partitions for which the
> constraint can be proven.  For example, suppose the top-level table is
> list-partitioned with a partition for each of the most common values,
> and then we range-partition the default partition.
>

I have tried to address this in patch 0007, please let me know your views on
that patch.

- The changes to the regression test results in 0004 make the error
> messages slightly worse.  The old message names the default partition,
> whereas the new one does not.  Maybe that's worth avoiding.


The only way for this, I can think of to achieve this is to store the name
of
the default relation in AlteredTableInfo, currently I am using a flag for
realizing if the scanned table is a default partition to throw specific
error.
But, IMO storing a string in AlteredTableInfo just for error purpose might
be
overkill. Your suggestions?


> Specific comments:
>
> + * Also, invalidate the parent's and a sibling default partition's
> relcache,
> + * so that the next rebuild will load the new partition's info into
> parent's
> + * partition descriptor and default partition constraints(which are
> dependent
> + * on other partition bounds) are built anew.
>
> I find this a bit unclear, and it also repeats the comment further
> down.  Maybe something like: Also, invalidate the parent's relcache
> entry, so that the next rebuild will load he new partition's info into
> its partition descriptor.  If there is a default partition, we must
> invalidate its relcache entry as well.
>
> Done.


> +/*
> + * The default partition constraints depend upon the partition bounds
> of
> + * other partitions. Adding a new(or even removing existing) partition
> + * would invalidate the default partition constraints. Invalidate the
> + * default partition's relcache so that the constraints are built
> anew and
> + * any plans dependent on those constraints are invalidated as well.
> + */
>
> Here, I'd write: The partition constraint for the default partition
> depends on the partition bounds of every other partition, so we must
> invalidate the relcache entry for that partition every time a
> partition is added or removed.
>
> Done.


> +/*
> + * Default partition cannot be added if there already
> + * exists one.
> + */
> +if (spec->is_default)
> +{
> +overlap = partition_bound_has_default(boundinfo);
> +with = boundinfo->default_index;
> +break;
> +}
>
> To support default partitioning for range, this is going to have to be
> moved above the switch rather than done inside of it.  And there's
> really no downside to putting it there.
>
> Done.


> + * constraint, by *proving* that the existing constraints of the table
> + * *imply* the given constraints.  We include the table's check
> constraints and
>
> Both the comma and the asterisks are unnecessary.
>
> Done.


> + * Check whether all rows in the given table (scanRel) obey given
> partition
>
> obey the given
>
> I think the larger comment block could be tightened up a bit, like
> this:  Check whether all rows in the given table obey the given
> partition constraint; if so, it can be attached as a partition.  We do
> this by scanning the table (or all of its 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 12:30 PM, Peter Geoghegan  wrote:
> On Wed, Jul 12, 2017 at 4:28 AM, Alik Khilazhev
>  wrote:
>> I am attaching results of query that you sent. It shows that there is
>> nothing have changed after executing tests.
>
> But something did change! In the case where performance was good, all
> internal pages on the level above the leaf level have exactly 285
> items, excluding the rightmost page, which unsurprisingly didn't fill.
> However, after increasing client count to get the slow case, the "hot"
> part of the keyspace (the leaf pages owned by the first/leftmost
> internal page on that level) has 353 items rather than 285.

Could you please run the query again for both cases, but this time
change it to get items from the leaf level too, and not just internal
levels? Just remove the "wheretype != 'l' " clause in one of the CTEs.
That should do it.

It's probably the case that the hot part of the B-tree is actually
significantly less than 353 items (or 285 items), and so the "before"
and "after" difference in how many pages are used for the hot part of
the keyspace could be quite a lot larger than my initial estimate. It
could be that the granularity that is shown on the internal pages is
insufficient to see just how bad the problem is.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Peter Geoghegan
On Wed, Jul 12, 2017 at 4:28 AM, Alik Khilazhev
 wrote:
> I am attaching results of query that you sent. It shows that there is
> nothing have changed after executing tests.

But something did change! In the case where performance was good, all
internal pages on the level above the leaf level have exactly 285
items, excluding the rightmost page, which unsurprisingly didn't fill.
However, after increasing client count to get the slow case, the "hot"
part of the keyspace (the leaf pages owned by the first/leftmost
internal page on that level) has 353 items rather than 285.

Now, that might not seem like that much of a difference, but if you
consider how duplicates are handled in the B-Tree code, and how unique
index enforcement works, I think it could be. It could lead to heavy
buffer lock contention, because we sometimes do a lot of work with an
exclusive buffer lock held.

This is something that I go into a lot of detail on in the Wiki page
on Key normalization:
https://wiki.postgresql.org/wiki/Key_normalization#Avoiding_unnecessary_unique_index_enforcement

Now, I'm not saying that I know for sure that that's what it is. It
seems like a good area to investigate, though. Even if it wasn't
buffer lock contention, we're still talking about the difference
between the hot part of the B-Tree being about 353 pages, versus 285.
Buffer lock contention could naturally limit the growth in size to
"only" 353, by slowing everything down.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function Volatility and Views Unexpected Behavior

2017-07-12 Thread Tom Lane
David Kohn  writes:
> I encountered some unexpected behavior when debugging a query that was
> taking longer than expected, basically, a volatile function that makes a
> column in a view is called even when that column is not selected in the
> query, making it so that the function is called for every row in the view,
> I'm not sure that that would necessarily be the expected behavior, as it
> was my understanding that columns that are not selected are not evaluated,
> for instance if there was a join in a view that produced some columns and
> said columns were not selected, I would expect it to be optimized away.

No, this is the expected behavior; we don't like optimization to change
the number of calls of a volatile function from what would occur in naive
evaluation of the query.  If that prospect doesn't bother you, it's
likely because your function isn't really volatile ...

> The other problem is that the function call does not appear in the query
> plan.

I think "explain verbose" will fix that for you.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Tom Lane
Sandeep Thakkar  writes:
> On Wed, Jul 12, 2017 at 8:13 PM, Tom Lane  wrote:
>> Ugh.  Please provide details about the hand-built ICU --- what version,
>> what configure options did you use for it, etc.

> I tried with ICU 53.1 and 57.1 and the results is same.
> There was no configure options other than --prefix.

Well, when I try that on macOS, I get an ICU installation that's basically
nonfunctional, apparently because they forgot to specify -install_name
while linking the dylibs.  The symptom looks a bit different:

$ initdb
dyld: Library not loaded: libicui18n.57.dylib
  Referenced from: /Users/tgl/testversion/bin/postgres
  Reason: image not found

but it's clear from otool that the dynamic linker doesn't know where
to find the ICU libraries:

$ otool -L ~/testversion/bin/postgres
/Users/tgl/testversion/bin/postgres:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1238.60.2)
libicui18n.57.dylib (compatibility version 57.0.0, current version 
57.1.0)
libicuuc.57.dylib (compatibility version 57.0.0, current version 57.1.0)

and this is evidently because the libraries themselves don't know where
they live:

$ otool -D /usr/local/icu-57.1/lib/libicui18n.57.dylib
/usr/local/icu-57.1/lib/libicui18n.57.dylib:
libicui18n.57.dylib

Compare to what you get for a properly built dylib:

$ otool -D /usr/local/ssl-1.1.0e/lib/libssl.1.1.dylib 
/usr/local/ssl-1.1.0e/lib/libssl.1.1.dylib:
/usr/local/ssl-1.1.0e/lib/libssl.1.1.dylib

I can make it work by setting DYLD_LIBRARY_PATH:

$ DYLD_LIBRARY_PATH=/usr/local/icu-57.1/lib initdb
... goes through normally ...

but that hardly qualifies as a production-grade solution.

I've not dug into the ICU sources to see what it would take to get the
correct link option applied.  But the short answer seems to be that ICU
is not up to speed for installation in non-default locations on macOS.

For the record's sake, I configured icu4c-57_1-src.tgz like this:

$ ./runConfigureICU MacOSX --prefix=/usr/local/icu-57.1

and since there's no pkg-config on this machine, I did this to
configure Postgres:

$ ICU_CFLAGS=" " ICU_LIBS="-licui18n -licuuc -licudata" ./configure [usual 
options] --with-icu --with-includes=/usr/local/icu-57.1/include 
--with-libraries=/usr/local/icu-57.1/lib

It's possible that if I did have pkg-config installed, it would have
produced some different value for ICU_LIBS, but I rather doubt it.
I'm not sure that it's possible to fix this on the consumer executable's
link line anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-07-12 Thread David Fetter
On Wed, Jul 12, 2017 at 07:38:56AM -0500, Robert Haas wrote:
> On Tue, May 9, 2017 at 9:43 PM, Chapman Flack  wrote:
> > That's where the appident.cookie() function comes in. You just
> > query it once at session establishment and remember the cookie.
> > That allows your code to say:
> >
> > SET SESSION ON BEHALF OF 'joe user' BECAUSE I HAVE :cookie AND I SAY SO;
> >
> > and Mallory can't inject that because he doesn't have :cookie and the
> > appident.cookie() function only succeeds the first time.
> 
> I have for a long time been interested in having a protocol-level
> method for setting the session identity.  That is, instead of sending
> "SET SESSION AUTHORIZATION 'rhaas'" wrapped in a Query message, you'd
> send some new protocol message that we invent that nails down the
> session authorization and doesn't allow it to be changed except by
> another protocol message.  I feel like the usefulness of this for
> connection pooling software is pretty obvious: it's a lot easier for
> the pooler to disallow a certain protocol message than a certain SQL
> command.

Neat idea.

> But maybe we could generalize it a bit, so that it can work for any
> GUC.  For example, let the client send some new SetVariableViaProtocol
> message with two arguments, a GUC name and a value.  The server treats
> this just like a SET command, except that once you've done this, the
> variable can't later be changed from SQL.  So now you don't even need
> the cookie, and the client can't try to guess the cookie.  You just
> tell the server - via this protocol message - to nail down
> session_authorization or application_name or appuser.thunk to some
> value of your choice, and it's invulnerable to SQL injection
> thereafter.

Likely to SQL injection.  I guess that would direct attackers to the
protocol, which I suppose is already a harder target, assuming SSL or
similar.

> Whaddaya think?

I thing I'm not alone in wanting some way to set parameters and hold
them steady.  I confess I'd been thinking more in terms of a DCL for
these, but that makes it a lot less flexible than what you're
proposing in the sense that it's set on connect rather than via GRANT
and REVOKE.

One thing I'm not seeing how to do via your proposal is to hold these
things for local (not localhost) users.  Is there some way to handle
them, too, or would that be over-engineering this, given what a local
user can already accomplish?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-12 Thread Andrew Dunstan


On 07/12/2017 11:49 AM, Dave Page wrote:
>
>
> On Wed, Jul 12, 2017 at 4:35 PM, Tom Lane  > wrote:
>
> Sandeep Thakkar  > writes:
> > I compiled PG 10 beta1/beta2 with "--with-perl" option on
> Windows and the
> > extension crashes the database.
> > *src/pl/plperl/Util.c: loadable library and perl binaries are
> mismatched
> > (got handshake key 0A900080, needed 0AC80080)*
>
> > This is seen with Perl 5.24 but not with 5.20, 5.16. What I
> found is that
> > the handshake function is added in Perl 5.21.x and probably that
> is why we
> > don't see this issue in earlier versions.
>
> Well, we have various buildfarm machines running perls newer than
> that,
> eg, crake, with 5.24.1.  So I'd say there is something busted
> about your
> perl installation.  Perhaps leftover bits of an older version
> somewhere?
>
>
> Well crake is a Fedora box - and we have no problems on Linux, only on
> Windows. 
>
>


Yeah, I have this on one of my Windows boxes, and haven't had time to
get to the bottom of it yet ;-(

Latest versions of ActivePerl don't ship with library descriptor files,
either, which is unpleasant.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-12 Thread Jeff Janes
On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila 
wrote:

> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes  wrote:
> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar 
> wrote:
> >>
> >> So because of this high projection cost the seqpath and parallel path
> >> both have fuzzily same cost but seqpath is winning because it's
> >> parallel safe.
> >
> >
> > I think you are correct.  However, unless parallel_tuple_cost is set very
> > low, apply_projection_to_path never gets called with the Gather path as
> an
> > argument.  It gets ruled out at some earlier stage, presumably because it
> > assumes the projection step cannot make it win if it is already behind by
> > enough.
> >
>
> I think that is genuine because tuple communication cost is very high.
>

Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.

I agree that the communication cost is high, which is why I don't want to
have to set parellel_tuple_cost very low.  For example, to get the benefit
of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
real-world case, not the dummy test case I posted, although the number are
around the same for that one too).  But with a setting that low, all kinds
of other things also start using parallel plans, even if they don't benefit
from them and are harmed.

I realize we need to do some aggressive pruning to avoid an exponential
explosion in planning time, but in this case it has some rather unfortunate
consequences.  I wanted to explore it, but I can't figure out where this
particular pruning is taking place.

By the time we get to planner.c line 1787, current_rel->pathlist already
does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
pruning is happening earlier than that.



> If your table is reasonable large then you might want to try by
> increasing parallel workers (Alter Table ... Set (parallel_workers =
> ..))
>


Setting parallel_workers to 8 changes the threshold for the parallel to
even be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076.  So it
is going in the correct direction, but not by enough to matter.

Cheers,

Jeff


Re: Fwd: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-07-12 Thread Claudio Freire
On Wed, Jul 12, 2017 at 1:08 PM, Claudio Freire  wrote:
> On Wed, Jul 12, 2017 at 11:48 AM, Alexey Chernyshov
>  wrote:
>> Thank you for the patch and benchmark results, I have a couple remarks.
>> Firstly, padding in DeadTuplesSegment
>>
>> typedef struct DeadTuplesSegment
>>
>> {
>>
>> ItemPointerData last_dead_tuple;/* Copy of the last dead tuple
>> (unset
>>
>>  * until the segment is fully
>>
>>  * populated). Keep it first to
>> simplify
>>
>>  * binary searches */
>>
>> unsigned short padding;/* Align dt_tids to 32-bits,
>>
>>  * sizeof(ItemPointerData) is aligned to
>>
>>  * short, so add a padding short, to make
>> the
>>
>>  * size of DeadTuplesSegment a multiple of
>>
>>  * 32-bits and align integer components for
>>
>>  * better performance during lookups into
>> the
>>
>>  * multiarray */
>>
>> intnum_dead_tuples;/* # of entries in the segment */
>>
>> intmax_dead_tuples;/* # of entries allocated in the
>> segment */
>>
>> ItemPointer dt_tids;/* Array of dead tuples */
>>
>> }DeadTuplesSegment;
>>
>> In the comments to ItemPointerData is written that it is 6 bytes long, but
>> can be padded to 8 bytes by some compilers, so if we add padding in a
>> current way, there is no guaranty that it will be done as it is expected.
>> The other way to do it with pg_attribute_alligned. But in my opinion, there
>> is no need to do it manually, because the compiler will do this optimization
>> itself.
>
> I'll look into it. But my experience is that compilers won't align
> struct size like this, only attributes, and this attribute is composed
> of 16-bit attributes so it doesn't get aligned by default.

Doing sizeof(DeadTuplesSegment) suggests you were indeed right, at
least in GCC. I'll remove the padding.

Seems I just got the wrong impression at some point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-07-12 Thread Claudio Freire
On Wed, Jul 12, 2017 at 11:48 AM, Alexey Chernyshov
 wrote:
> Thank you for the patch and benchmark results, I have a couple remarks.
> Firstly, padding in DeadTuplesSegment
>
> typedef struct DeadTuplesSegment
>
> {
>
> ItemPointerData last_dead_tuple;/* Copy of the last dead tuple
> (unset
>
>  * until the segment is fully
>
>  * populated). Keep it first to
> simplify
>
>  * binary searches */
>
> unsigned short padding;/* Align dt_tids to 32-bits,
>
>  * sizeof(ItemPointerData) is aligned to
>
>  * short, so add a padding short, to make
> the
>
>  * size of DeadTuplesSegment a multiple of
>
>  * 32-bits and align integer components for
>
>  * better performance during lookups into
> the
>
>  * multiarray */
>
> intnum_dead_tuples;/* # of entries in the segment */
>
> intmax_dead_tuples;/* # of entries allocated in the
> segment */
>
> ItemPointer dt_tids;/* Array of dead tuples */
>
> }DeadTuplesSegment;
>
> In the comments to ItemPointerData is written that it is 6 bytes long, but
> can be padded to 8 bytes by some compilers, so if we add padding in a
> current way, there is no guaranty that it will be done as it is expected.
> The other way to do it with pg_attribute_alligned. But in my opinion, there
> is no need to do it manually, because the compiler will do this optimization
> itself.

I'll look into it. But my experience is that compilers won't align
struct size like this, only attributes, and this attribute is composed
of 16-bit attributes so it doesn't get aligned by default.

> On 11.07.2017 19:51, Claudio Freire wrote:
>>>
>>> -
>>>
>>> +   /* Search for the segment likely to contain the item pointer */
>>> +   iseg = vac_itemptr_binsrch(
>>> +   (void *) itemptr,
>>> +   (void *)
>>> &(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
>>> +   vacrelstats->dead_tuples.last_seg + 1,
>>> +   sizeof(DeadTuplesSegment));
>>> +
>>>
>>> I think that we can change the above to;
>>>
>>> +   /* Search for the segment likely to contain the item pointer */
>>> +   iseg = vac_itemptr_binsrch(
>>> +   (void *) itemptr,
>>> +   (void *) &(seg->last_dead_tuple),
>>> +   vacrelstats->dead_tuples.last_seg + 1,
>>> +   sizeof(DeadTuplesSegment));
>>>
>>> We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.
>>
>> Right
>
> In my mind, if you change vacrelstats->dead_tuples.last_seg + 1 with
> GetNumDeadTuplesSegments(vacrelstats), it would be more meaningful.

It's not the same thing. The first run it might, but after a reset of
the multiarray, num segments is the allocated size, while last_seg is
the last one filled with data.

> Besides, you can change the vac_itemptr_binsrch within the segment with
> stdlib bsearch, like:
>
> res = (ItemPointer) bsearch((void *) itemptr,
>
> (void *) seg->dt_tids,
>
> seg->num_dead_tuples,
>
> sizeof(ItemPointerData),
>
> vac_cmp_itemptr);
>
> return (res != NULL);

The stdlib's bsearch is quite slower. The custom bsearch inlines the
comparison making it able to factor out of the loop quite a bit of
logic, and in general generate far more specialized assembly.

For the compiler to optimize the stdlib's bsearch call, whole-program
optimization should be enabled, something that is unlikely. Even then,
it may not be able to, due to aliasing rules.

This is what I came up to make the new approach's performance on par
or better than the old one, in CPU cycles. In fact, benchmarks show
that time spent on the CPU is lower now, in large part, due to this.

It's not like it's the first custom binary search in postgres, also.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Function Volatility and Views Unexpected Behavior

2017-07-12 Thread David Kohn
(Not sure whether I should be sending this to hackers or somewhere else,
let me know if I should move, feel like I should say long time listener
first time caller or something)

I encountered some unexpected behavior when debugging a query that was
taking longer than expected, basically, a volatile function that makes a
column in a view is called even when that column is not selected in the
query, making it so that the function is called for every row in the view,
I'm not sure that that would necessarily be the expected behavior, as it
was my understanding that columns that are not selected are not evaluated,
for instance if there was a join in a view that produced some columns and
said columns were not selected, I would expect it to be optimized away. I
understand that for volatile functions, that might not be possible, but it
might still be unexpected. See the below example:
```
--setup
create table table1 (id int primary key, data1 float);
create table table2 (id int primary key, table1_id int references table1
(id), data2 float);
insert into table1 select generate_series(1, 1), random();
insert into table2 select generate_series(1,10),
floor(random()*(1-2))+1, random();

create view table1_view as select * from table1;
create or replace function something_silly(int) returns float as $$select
0.002342::float from (select pg_sleep(0.001)) b $$ language SQL;
create view table1_silly_view as select id, data1, something_silly(id) from
table1;
```

`explain analyze select t1.data1 from table2 t2 left join table1_silly_view
t1 on t2.table1_id = t1.id where t2.data2 >0.5 and t2.data2<0.52; `
Hash Left Join  (cost=2880.00..4949.79 rows=2094 width=8) (actual
time=22813.018..22841.189 rows=1995 loops=1)
  Hash Cond: (t2.table1_id = t1.id)
  ->  Seq Scan on table2 t2  (cost=0.00..2041.00 rows=2094 width=4) (actual
time=0.018..26.836 rows=1995 loops=1)
Filter: ((data2 > '0.5'::double precision) AND (data2 <
'0.52'::double precision))
Rows Removed by Filter: 98005
  ->  Hash  (cost=2755.00..2755.00 rows=1 width=12) (actual
time=22812.977..22812.977 rows=1 loops=1)
Buckets: 16384  Batches: 1  Memory Usage: 558kB
->  Subquery Scan on t1  (cost=0.00..2755.00 rows=1 width=12)
(actual time=2.447..22768.199 rows=1 loops=1)
  ->  Seq Scan on table1  (cost=0.00..2655.00 rows=1
width=20) (actual time=2.446..22751.809 rows=1 loops=1)
Planning time: 0.476 ms
Execution time: 22841.362 ms

```
create or replace function something_less_silly(int) returns float as
$$select 0.002342::float from (select pg_sleep(0.001)) b $$ language SQL
stable;
create view table1_slightly_less_silly_view as select id, data1,
something_less_silly(id) from table1;
```
`explain analyze select data1 from table2 t2 left join
table1_slightly_less_silly_view t1 on t2.table1_id = t1.id where t2.data2
>0.5 and t2.data2<0.52;`
Hash Left Join  (cost=280.00..2349.79 rows=2094 width=8) (actual
time=8.634..48.834 rows=1995 loops=1)
  Hash Cond: (t2.table1_id = table1.id)
  ->  Seq Scan on table2 t2  (cost=0.00..2041.00 rows=2094 width=4) (actual
time=0.027..38.253 rows=1995 loops=1)
Filter: ((data2 > '0.5'::double precision) AND (data2 <
'0.52'::double precision))
Rows Removed by Filter: 98005
  ->  Hash  (cost=155.00..155.00 rows=1 width=12) (actual
time=8.579..8.579 rows=1 loops=1)
Buckets: 16384  Batches: 1  Memory Usage: 558kB
->  Seq Scan on table1  (cost=0.00..155.00 rows=1 width=12)
(actual time=0.012..3.984 rows=1 loops=1)
Planning time: 0.468 ms
Execution time: 49.068 ms

The other problem is that the function call does not appear in the query
plan. You see lots of extra time in the sequential scan part, but no idea
of why that might be occurring, so it can be hard to determine that the
culprit is the function call in the view in a column that you're not
selecting. It is also unclear to me whether the function would be evaluated
for every row in the view even when a where clause restricts what you are
retrieving from the view. Anyway, I'm not sure whether this is a bug or a
feature, but I think it might be good to document it somewhere, I had a
real case where someone had made a function to do some calculations on a
json blob, and hadn't marked it stable/immutable, which it should have
been, they weren't selecting that column from the view, but the query was
taking 700ms when a query to the underlying table took 1.5ms. We marked the
function stable and the view started working fine, it was just really hard
to figure out that that was what was going on.

Would it be possible to have the explain output show that function call? It
appears that the planner knows about it because it estimates a
significantly higher cost for the seq scan on table1, but just doesn't
mention that that's why.

That might be a longer term fix, in the meantime, given that the default
volatility category is volatile, I can imagine a newer user being truly

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-12 Thread Dave Page
On Wed, Jul 12, 2017 at 4:35 PM, Tom Lane  wrote:

> Sandeep Thakkar  writes:
> > I compiled PG 10 beta1/beta2 with "--with-perl" option on Windows and the
> > extension crashes the database.
> > *src/pl/plperl/Util.c: loadable library and perl binaries are mismatched
> > (got handshake key 0A900080, needed 0AC80080)*
>
> > This is seen with Perl 5.24 but not with 5.20, 5.16. What I found is that
> > the handshake function is added in Perl 5.21.x and probably that is why
> we
> > don't see this issue in earlier versions.
>
> Well, we have various buildfarm machines running perls newer than that,
> eg, crake, with 5.24.1.  So I'd say there is something busted about your
> perl installation.  Perhaps leftover bits of an older version somewhere?
>

Well crake is a Fedora box - and we have no problems on Linux, only on
Windows.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Jeroen Ooms
On Wed, Jul 12, 2017 at 5:11 PM, Tom Lane  wrote:
> Jeroen Ooms  writes:
>> I maintain static libraries for libpq for the R programming language
>> (we need static linking to ship with the binary packages).
>
> How do you get that past vendor packaging policies?  When I worked at
> Red Hat, there was a very strong policy against allowing any package
> to statically embed parts of another one, because it creates serious
> management problems if e.g. the other one needs a security update.
> I'm sure Red Hat isn't the only distro that feels that way.

We only use this on Windows. On platforms with a decent package
manager we indeed link to a shared library.

> FWIW, we used to have support for building static libpq, but
> we got rid of it a long time ago.

OK that's too bad. I agree that shared libs are often preferable but
in some environments dynamic linking is simply not possible and you
need to static link the library into the application. Most C/C++
libraries do support --enable-static and even for most linux distros
the static libs are included with the lib-dev / lib-devel package.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-07-12 Thread Tom Lane
Sandeep Thakkar  writes:
> I compiled PG 10 beta1/beta2 with "--with-perl" option on Windows and the
> extension crashes the database.
> *src/pl/plperl/Util.c: loadable library and perl binaries are mismatched
> (got handshake key 0A900080, needed 0AC80080)*

> This is seen with Perl 5.24 but not with 5.20, 5.16. What I found is that
> the handshake function is added in Perl 5.21.x and probably that is why we
> don't see this issue in earlier versions.

Well, we have various buildfarm machines running perls newer than that,
eg, crake, with 5.24.1.  So I'd say there is something busted about your
perl installation.  Perhaps leftover bits of an older version somewhere?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Sandeep Thakkar
On Wed, Jul 12, 2017 at 8:13 PM, Tom Lane  wrote:

> Sandeep Thakkar  writes:
> > On MacOS, I configured PG10 beta2 sources with '--with-icu" option along
> > with ICU_LIBS and ICU_CFLAGS env variables to define the location of the
> > ICU libs that I built. Once the staging is ready, I executed initdb but
> it
> > fails with the following error:
> > [53618] FATAL:  could not open collator for locale "und":
> > U_FILE_ACCESS_ERROR*
> > *2017-07-12 11:18:21.187 BST [53618] STATEMENT:  SELECT
> > pg_import_system_collations('pg_catalog');*
>
> Ugh.  Please provide details about the hand-built ICU --- what version,
> what configure options did you use for it, etc.
>
> I tried with ICU 53.1 and 57.1 and the results is same.
There was no configure options other than --prefix. In environment,
CXXFLAGS was set to "-isysroot
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk
-mmacosx-version-min=10.7 -arch i386 -arch x86_64"


regards, tom lane
>



-- 
Sandeep Thakkar
EDB


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Tom Lane
Jeroen Ooms  writes:
> I maintain static libraries for libpq for the R programming language
> (we need static linking to ship with the binary packages).

How do you get that past vendor packaging policies?  When I worked at
Red Hat, there was a very strong policy against allowing any package
to statically embed parts of another one, because it creates serious
management problems if e.g. the other one needs a security update.
I'm sure Red Hat isn't the only distro that feels that way.

I think you'd be better advised to fix things so you can link with
the standard shared-library version of libpq (and whatever else
you're doing this with).

> This works but it's a bit of a pain to maintain. I was wondering if
> this hack could be merged so that the standard 'configure
> --enable-static' script would install a static library for libpq
> alongside the shared one.

FWIW, we used to have support for building static libpq, but
we got rid of it a long time ago.  I couldn't find the exact
spot in some desultory trawling of the commit history.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

2017-07-12 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hi Craig,

Here is bt after installing all the missing debuginfo packages.

(gdb) bt
#0  0x00fff7682f18 in do_lookup_x (undef_name=undef_name@entry=0xfff75cece5 
"_Jv_RegisterClasses", new_hash=new_hash@entry=2681263574,
old_hash=old_hash@entry=0xa159b8, ref=0xfff75ceac8, 
result=result@entry=0xa159a0, scope=, i=1, 
version=version@entry=0x0,
flags=flags@entry=1, skip=skip@entry=0x0, type_class=type_class@entry=0, 
undef_map=undef_map@entry=0xfff76a9478) at dl-lookup.c:444
#1  0x00fff76839a0 in _dl_lookup_symbol_x (undef_name=0xfff75cece5 
"_Jv_RegisterClasses", undef_map=0xfff76a9478, ref=0xa15a90,
symbol_scope=0xfff76a9980, version=0x0, type_class=, 
flags=, skip_map=0x0) at dl-lookup.c:833
#2  0x00fff7685730 in elf_machine_got_rel (lazy=1, map=0xfff76a9478) at 
../sysdeps/mips/dl-machine.h:870
#3  elf_machine_runtime_setup (profile=, lazy=1, l=0xfff76a9478) 
at ../sysdeps/mips/dl-machine.h:916
#4  _dl_relocate_object (scope=0xfff76a9980, reloc_mode=, 
consider_profiling=0) at dl-reloc.c:259
#5  0x00fff767ba10 in dl_main (phdr=, 
phdr@entry=0x12040, phnum=, phnum@entry=8,
user_entry=user_entry@entry=0xa15cf0, auxv=) at 
rtld.c:2070
#6  0x00fff7692e3c in _dl_sysdep_start (start_argptr=, 
dl_main=0xfff7679a98 ) at ../elf/dl-sysdep.c:249
#7  0x00fff767d0d8 in _dl_start_final (arg=arg@entry=0xa16410, 
info=info@entry=0xa15d80) at rtld.c:307
#8  0x00fff767d3d8 in _dl_start (arg=0xa16410) at rtld.c:415
#9  0x00fff7679380 in __start () from /lib64/ld.so.1

Please see if this could help in analysing the issue.

Regards,
Sandhya

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
Sent: Friday, July 07, 2017 1:53 PM
To: K S, Sandhya (Nokia - IN/Bangalore) 
Cc: pgsql-bugs ; PostgreSQL Hackers 
; T, Rasna (Nokia - IN/Bangalore) 
; Itnal, Prakash (Nokia - IN/Bangalore) 

Subject: Re: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core

On 7 July 2017 at 15:41, K S, Sandhya (Nokia - IN/Bangalore) 
> wrote:
Hi Craig,

The scenario is lock and unlock of the system for 30 times. During this 
scenario 5 sh-QUIT core is generated. GDB of 5 core is pointing to different 
locations.
I have attached output for 2 such instance.


You seem to be missing debug symbols. Install appropriate debuginfo packages.


--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] New partitioning - some feedback

2017-07-12 Thread Alvaro Herrera
Amit Langote wrote:
> On 2017/07/11 13:34, Alvaro Herrera wrote:
> > Robert Haas wrote:
> >> On Mon, Jul 10, 2017 at 2:15 AM, Amit Langote
> >>  wrote:
> > 
> >>> Actually, if \d had shown RELKIND_PARTITIONED_TABLE tables as of Type
> >>> "partitioned table", we wouldn't need a separate flag for marking a table
> >>> as having partitions.
> >>
> >> I think that is false.  Whether something is partitioned and whether
> >> it is a partition are independent concerns.
> > 
> > Maybe this discussion is easier if we differentiate "list tables" (\dt,
> > or \d without a pattern) from "describe table" (\d with a name pattern).
> 
> I think this discussion has mostly focused on "list tables" so far.

Yes, which I think is a mistake, because for some things you definitely
need a list of partitions of the table in question.  And "describe
table" can fulfill that role perfectly well, ISTM.

> > It seems to me that the "describe" command should list partitions --
> > perhaps only when the + flag is given.
> 
> That's what happens today.

So no further changes needed there -- good.

> > However, the "list tables"
> > command \dt should definitely IMO not list partitions.
> 
> Do you mean never?  Even if a modifier is specified?  In the patch I
> proposed, \d! (or \d+ or \d++, if '!' turns out to be unpopular) will list
> partitions, but \d or \dt won't.  That is, partitions are hidden by default.

I don't think there is any need for a single list of all partition of
all tables -- is there?  I can't think of anything, but then I haven't
been exposed very much to this feature yet.  For now, I lean towards "never".

(A different consideration is the use case of listing relation
relfrozenxid/relminmxid ages, but that use case is already not fulfilled
by psql metacommands so you still need custom catalog queries).

I don't think \d! works terribly well as a mental model, but maybe
that's just me.

> > Maybe \dt should
> > have some flag indicating whether each table is partitioned.
> 
> So it seems most of us are in favor for showing partitioned tables as
> "partitioned table" instead of "table" in the table listing.

Yeah, that sounds good to me.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Causal reads take II

2017-07-12 Thread Dmitry Dolgov
> On 23 June 2017 at 13:48, Thomas Munro 
wrote:
>
> Apologies for the extended delay.  Here is the rebased patch, now with a
> couple of improvements (see below).

Thank you. I started to play with it a little bit, since I think it's an
interesting idea. And there are already few notes:

* I don't see a CF item for that, where is it?

* Looks like there is a sort of sensitive typo in `postgresql.conf.sample`:

```
+#causal_reads_standy_names = '*' # standby servers that can potentially
become
+ # available for causal reads; '*' = all
+
```

it should be `causal_reads_standby_names`. Also I hope in the nearest
future I
can provide a full review.


Re: Fwd: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-07-12 Thread Alexey Chernyshov

Thank you for the patch and benchmark results, I have a couple remarks.
Firstly, padding in DeadTuplesSegment

typedef struct DeadTuplesSegment

{

ItemPointerData last_dead_tuple;/* Copy of the last dead tuple 
(unset


 * until the segment is fully

 * populated). Keep it first to 
simplify


 * binary searches */

unsigned short padding;/* Align dt_tids to 32-bits,

 * sizeof(ItemPointerData) is aligned to

 * short, so add a padding short, to 
make the


 * size of DeadTuplesSegment a multiple of

 * 32-bits and align integer components 
for


 * better performance during lookups 
into the


 * multiarray */

intnum_dead_tuples;/* # of entries in the segment */

intmax_dead_tuples;/* # of entries allocated in the 
segment */


ItemPointer dt_tids;/* Array of dead tuples */

}DeadTuplesSegment;

In the comments to ItemPointerData is written that it is 6 bytes long, 
but can be padded to 8 bytes by some compilers, so if we add padding in 
a current way, there is no guaranty that it will be done as it is 
expected. The other way to do it with pg_attribute_alligned. But in my 
opinion, there is no need to do it manually, because the compiler will 
do this optimization itself.


On 11.07.2017 19:51, Claudio Freire wrote:

-

+   /* Search for the segment likely to contain the item pointer */
+   iseg = vac_itemptr_binsrch(
+   (void *) itemptr,
+   (void *)
&(vacrelstats->dead_tuples.dt_segments->last_dead_tuple),
+   vacrelstats->dead_tuples.last_seg + 1,
+   sizeof(DeadTuplesSegment));
+

I think that we can change the above to;

+   /* Search for the segment likely to contain the item pointer */
+   iseg = vac_itemptr_binsrch(
+   (void *) itemptr,
+   (void *) &(seg->last_dead_tuple),
+   vacrelstats->dead_tuples.last_seg + 1,
+   sizeof(DeadTuplesSegment));

We set "seg = vacrelstats->dead_tuples.dt_segments" just before this.

Right
In my mind, if you change vacrelstats->dead_tuples.last_seg + 1 with 
GetNumDeadTuplesSegments(vacrelstats), it would be more meaningful.
Besides, you can change the vac_itemptr_binsrch within the segment with 
stdlib bsearch, like:


res = (ItemPointer) bsearch((void *) itemptr,

(void *) seg->dt_tids,

seg->num_dead_tuples,

sizeof(ItemPointerData),

vac_cmp_itemptr);

return (res != NULL);


Those are before the changes in the review. While I don't expect any
change, I'll re-run some of them just in case, and try to investigate
the slowdown. But that will take forever. Each run takes about a week
on my test rig, and I don't have enough hardware to parallelize the
tests. I will run a test on a snapshot of a particularly troublesome
production database we have, that should be interesting.

Very interesting, waiting for the results.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Tom Lane
Sandeep Thakkar  writes:
> On MacOS, I configured PG10 beta2 sources with '--with-icu" option along
> with ICU_LIBS and ICU_CFLAGS env variables to define the location of the
> ICU libs that I built. Once the staging is ready, I executed initdb but it
> fails with the following error:
> [53618] FATAL:  could not open collator for locale "und":
> U_FILE_ACCESS_ERROR*
> *2017-07-12 11:18:21.187 BST [53618] STATEMENT:  SELECT
> pg_import_system_collations('pg_catalog');*

Ugh.  Please provide details about the hand-built ICU --- what version,
what configure options did you use for it, etc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] building libpq.a static library

2017-07-12 Thread Jan de Visser
On Wednesday, July 12, 2017 6:31:09 AM EDT Jeroen Ooms wrote:
> I maintain static libraries for libpq for the R programming language
> (we need static linking to ship with the binary packages).
> Unfortunately currently the standard postgres makefile only generates
> a shared library for libpq, not a static one.
> 
> In order to make a static library I always manually edit
> ./src/interfaces/libpq/Makefile and add a target:
> 
> libpq.a: $(OBJS)
>   ar rcs $@ $^
> 
> And then run 'make libpq.a' in the libpq directory.
> 
> This works but it's a bit of a pain to maintain. I was wondering if
> this hack could be merged so that the standard 'configure
> --enable-static' script would install a static library for libpq
> alongside the shared one.

I have no idea what the consensus about this is, but providing a patch file and 
registering it at https://commitfest.postgresql.org/ increases your chances.

I wish you good luck hacking autoconf hell (which is what you'll need to do to 
get the command line switch) though (*shudder*).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in backend/storage/ipc/standby.c

2017-07-12 Thread Heikki Linnakangas

On 07/11/2017 10:34 AM, Kyotaro HORIGUCHI wrote:

Hello.

I noticed that a comment above StandbyAcquireAccessExclusiveLock
in backend/storage/ipc/standby.c using wrong names of a variable
and a type.

The attached patch fixes it. The same mistake is found in older
versions back to 9.0.

fix_typo_of_standby_c_10_master.patch is for 10 and master and
fix_typo_of_standby_c_96_and_before.patch for 9.6 and before.


Applied, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DSA on 32 bit systems

2017-07-12 Thread Robert Haas
On Wed, Jul 12, 2017 at 7:34 AM, Thomas Munro
 wrote:
> That raises the question of whether we should go further and use a 64
> bit dsa_pointer even on 32 bit systems.  In a 32 bit dsa_pointer
> build, a single dsa_area is limited to 32 segments of up to 128MB in
> size (actually the largest power of 2 size it can allocate at once is
> 64MB due to free page management overhead).  The 64 bit version can
> handle 1024 segments of up to 1TB, which ought to be enough for
> anybody.  Having just one build variant without any limits you're
> likely to hit is appealing, but using oversized pointers will
> presumably slow 32 bit systems down, perhaps especially for client
> code that uses dsa_pointer_atomic (though I haven't tested that).

Hmm.  What we do we think the largest single allocation that has a
chance of working on a 32-bit system is in general?  My guess is that
1GB is not likely to work unless things break very well for you, but
256MB and 512MB might, absent implementation limitations.

> If we keep 32 bit dsa_pointer, I now see that parallel hash should
> probably know about this limit so it can abandon its load factor goal
> rather than making a doomed allocation request.

My overall feeling here is that this *probably* doesn't matter much
one way or the other, and we don't really have enough information
right now to know whether wider dsa_pointers are a bigger problem than
not being able to make large allocations, or visca versa.  But count
me as +0 for widening them to 64-bits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idea: custom log_line_prefix components besides application_name

2017-07-12 Thread Robert Haas
On Tue, May 9, 2017 at 9:43 PM, Chapman Flack  wrote:
> That's where the appident.cookie() function comes in. You just
> query it once at session establishment and remember the cookie.
> That allows your code to say:
>
> SET SESSION ON BEHALF OF 'joe user' BECAUSE I HAVE :cookie AND I SAY SO;
>
> and Mallory can't inject that because he doesn't have :cookie and the
> appident.cookie() function only succeeds the first time.

I have for a long time been interested in having a protocol-level
method for setting the session identity.  That is, instead of sending
"SET SESSION AUTHORIZATION 'rhaas'" wrapped in a Query message, you'd
send some new protocol message that we invent that nails down the
session authorization and doesn't allow it to be changed except by
another protocol message.  I feel like the usefulness of this for
connection pooling software is pretty obvious: it's a lot easier for
the pooler to disallow a certain protocol message than a certain SQL
command.

But maybe we could generalize it a bit, so that it can work for any
GUC.  For example, let the client send some new SetVariableViaProtocol
message with two arguments, a GUC name and a value.  The server treats
this just like a SET command, except that once you've done this, the
variable can't later be changed from SQL.  So now you don't even need
the cookie, and the client can't try to guess the cookie.  You just
tell the server - via this protocol message - to nail down
session_authorization or application_name or appuser.thunk to some
value of your choice, and it's invulnerable to SQL injection
thereafter.

Whaddaya think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] DSA on 32 bit systems

2017-07-12 Thread Thomas Munro
Hi hackers,

Commit e8fdbd58fe564a29977f4331cd26f9697d76fc40 introduced fallback
atomic u64 support.  PG_HAVE_ATOMIC_U64_SUPPORT now always finishes up
defined so it is pointless to test for it in dsa.h.  Here's a patch to
remove the obsolete test and comment.

That raises the question of whether we should go further and use a 64
bit dsa_pointer even on 32 bit systems.  In a 32 bit dsa_pointer
build, a single dsa_area is limited to 32 segments of up to 128MB in
size (actually the largest power of 2 size it can allocate at once is
64MB due to free page management overhead).  The 64 bit version can
handle 1024 segments of up to 1TB, which ought to be enough for
anybody.  Having just one build variant without any limits you're
likely to hit is appealing, but using oversized pointers will
presumably slow 32 bit systems down, perhaps especially for client
code that uses dsa_pointer_atomic (though I haven't tested that).

Thought experiment:  To allow allocations of 256MB without reducing
the total size available with a pattern of small allocations (~1GB),
we'd have to start larger (4MB) and grow faster (double every time)
due to lack of bits to represent the segment number.  That doesn't
seem very friendly to small systems, so maybe it's not a great idea to
change it.  Thoughts?

If we keep 32 bit dsa_pointer, I now see that parallel hash should
probably know about this limit so it can abandon its load factor goal
rather than making a doomed allocation request.

-- 
Thomas Munro
http://www.enterprisedb.com


tell-dsa-that-atomic-u64-is-always-available.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-07-12 Thread Mark Rofail
On Wed, Jul 12, 2017 at 12:53 AM, Alvaro Herrera 
wrote:
>
> We have one opclass for each type combination -- int4 to int2, int4 to
> int4, int4 to int8, etc.  You just need to add the new strategy to all
> the opclasses.
>

Can you clarify this solution ? I think another solution would be external
casting

BTW now that we've gone through this a little further, it's starting to
> look like a mistake to me to use the same @> operator for (anyarray,
> anyelement) than we use for (anyarray, anyarray).


I agree. Changed to @>>

Best Regards,
Mark Rofail


[HACKERS] pl/perl extension fails on Windows

2017-07-12 Thread Sandeep Thakkar
Hi,

I compiled PG 10 beta1/beta2 with "--with-perl" option on Windows and the
extension crashes the database.
--
postgres=# create extension plperl;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=#

It doesn't produce crashdump (in $DATA/crashdumps) but the log contains the
following error:

*src/pl/plperl/Util.c: loadable library and perl binaries are mismatched
(got handshake key 0A900080, needed 0AC80080)*
--

This is seen with Perl 5.24 but not with 5.20, 5.16. What I found is that
the handshake function is added in Perl 5.21.x and probably that is why we
don't see this issue in earlier versions.

The Perl that is used during compilation and on the target machine is same.
So probably plperl is not able to load the perl library. It works fine on
Linux and MacOS.

-- 
Sandeep Thakkar
EDB


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-07-12 Thread Fabien COELHO


LastBeginState -> RetryState? I'm not sure why this state is a pointer in 
CState. Putting the struct would avoid malloc/free cycles. Index "-1" may be 
used to tell it is not set if necessary.


Another detail I forgot about this point: there may be a memory leak on 
variables copies, ISTM that the "variables" array is never freed.


I was not convinced by the overall memory management around variables to 
begin with, and it is even less so with their new copy management. Maybe 
having a clean "Variables" data structure could help improve the 
situation.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alik Khilazhev
Hello!

I want to say that our company is already engaged in the search for the causes 
of the problem and their solution. And also we have few experimental patches 
that increases performance for 1000 clients by several times.
 
In addition, I have fixed threadsafety issues and implemented per-thread cache 
for zeta values. See attached patch.


pgbench-zipf-02v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-07-12 Thread Alik Khilazhev
On 7 Jul 2017, at 21:53, Peter Geoghegan  wrote:Is it possible for you to instrument the number of B-Tree pageaccesses using custom instrumentation for pgbench_accounts_pkey?If that seems like too much work, then it would still be interestingto see what the B-Tree keyspace looks like before and after varyingthe "nclient" count from, say, 32 to 128. Maybe there is a significantdifference in how balanced or skewed it is in each case. Or, the indexcould simply be more bloated.There is a query that I sometimes use, that itself uses pageinspect,to summarize the keyspace quickly. It shows you the highkey for everyinternal page, starting from the root and working down to the lowestinternal page level (the one just before the leaf level -- level 1),in logical/keyspace order. You can use it to visualize thedistribution of values. It could easily include the leaf level, too,but that's less interesting and tends to make the query take ages. Iwonder what the query will show here.Here is the query:…I am attaching results of query that you sent. It shows that there is nothing have changed after executing tests. ...before 128

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
-—-+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  3 |   575 |   289 |   860 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
19 c2 04 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  4 |   860 |   575 |  1145 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
21 58 06 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  5 |  1145 |   860 |  1430 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
29 ee 07 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  6 |  1430 |  1145 |  1715 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
31 84 09 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  7 |  1715 |  1430 |  2000 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
39 1a 0b 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  8 |  2000 |  1715 |  2285 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
41 b0 0c 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  9 |  2285 |  2000 |  2570 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
49 46 0e 00 00 00 00 00
 pgbench_accounts_pkey | 1 | 10 |  2570 |  2285 | 0 |   
   0 | i|177 |  0 |15 |  8192 |  4616 | 
(11 rows)

latency average = 1.375 ms
tps = 93085.250384 (including connections establishing)
tps = 93125.724773 (excluding connections establishing)
SQL script 1: /home/nglukhov/ycsb_read_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2782999 transactions (49.8% of total, tps = 46364.447705)
 - latency average = 0.131 ms
 - latency stddev = 0.087 ms
SQL script 2: /home/nglukhov/ycsb_update_zipf.sql
 - weight: 1 (targets 50.0% of total)
 - 2780197 transactions (49.8% of total, tps = 46317.766703)
 - latency average = 2.630 ms
 - latency stddev = 14.092 ms

after 128

  idx  | level | l_item | blkno | btpo_prev | btpo_next | 
btpo_flags | type | live_items | dead_items | avg_item_size | page_size | 
free_size | highkey 
—--+---++---+---+---++--+++---+---+---+-
 pgbench_accounts_pkey | 2 |  1 |   290 | 0 | 0 |   
   2 | r| 10 |  0 |15 |  8192 |  7956 | 
 pgbench_accounts_pkey | 1 |  1 | 3 | 0 |   289 |   
   0 | i|353 |  0 |15 |  8192 |  1096 | 
09 96 01 00 00 00 00 00
 pgbench_accounts_pkey | 1 |  2 |   289 | 3 |   575 |   
   0 | i|285 |  0 |15 |  8192 |  2456 | 
11 2c 03 00 00 00 00 

Re: [HACKERS] Double shared memory allocation for SLRU LWLocks

2017-07-12 Thread Teodor Sigaev

It seems to me that we're allocating shared memory for SLRU lwlocks twice,
unless I'm missing something.

I think you are right.

Did you check previous versions? i.e. should it be applyed to previous 
branches?? I suppose yes, to minimize code difference.


Also I'd like an idea to add Assert(offset <= SimpleLruShmemSize(nslots, nlsns)) 
at the end of SimpleLruInit()




SimpleLruShmemSize() calculates total SLRU shared memory size including lwlocks
size.

SimpleLruInit() starts with line

shared = (SlruShared) ShmemInitStruct(name,
  SimpleLruShmemSize(nslots, nlsns),
  );

which allocates SLRU shared memory (LWLocks size is included because
SimpleLruShmemSize() is used for size computation).

Following line allocates shared memory for LWLocks again:
shared->buffer_locks = (LWLockPadded *) ShmemAlloc(sizeof(LWLockPadded) * 
nslots);

Attached patch fixes that by removing extra ShmemAlloc for SLRU.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL10 beta2 with ICU - initdb fails on MacOS

2017-07-12 Thread Sandeep Thakkar
Hi

On MacOS, I configured PG10 beta2 sources with '--with-icu" option along
with ICU_LIBS and ICU_CFLAGS env variables to define the location of the
ICU libs that I built. Once the staging is ready, I executed initdb but it
fails with the following error:
--
$./initdb -D /tmp/data
The files belonging to this database system will be owned by user
"buildfarm".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /tmp/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
*performing post-bootstrap initialization ... 2017-07-12 11:18:21.187 BST
[53618] FATAL:  could not open collator for locale "und":
U_FILE_ACCESS_ERROR*
*2017-07-12 11:18:21.187 BST [53618] STATEMENT:  SELECT
pg_import_system_collations('pg_catalog');*
child process exited with exit code 1
initdb: removing data directory "/tmp/data"
--

Has anyone came across this? This error is not seen on Linux and Windows
though.

-- 
Sandeep Thakkar
EDB


[HACKERS] building libpq.a static library

2017-07-12 Thread Jeroen Ooms
I maintain static libraries for libpq for the R programming language
(we need static linking to ship with the binary packages).
Unfortunately currently the standard postgres makefile only generates
a shared library for libpq, not a static one.

In order to make a static library I always manually edit
./src/interfaces/libpq/Makefile and add a target:

libpq.a: $(OBJS)
  ar rcs $@ $^

And then run 'make libpq.a' in the libpq directory.

This works but it's a bit of a pain to maintain. I was wondering if
this hack could be merged so that the standard 'configure
--enable-static' script would install a static library for libpq
alongside the shared one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-07-12 Thread Fabien COELHO


Hello Marina,

There's the second version of my patch for pgbench. Now transactions 
with serialization and deadlock failures are rolled back and retried 
until they end successfully or their number of attempts reaches maximum.



In details:
 - You can set the maximum number of attempts by the appropriate 
benchmarking option (--max-attempts-number). Its default value is 1 
partly because retrying of shell commands can produce new errors.


 - Statistics of attempts and failures is printed in progress, in 
transaction / aggregation logs and in the end with other results (all 
and for each script). The transaction failure is reported here only if 
the last retry of this transaction fails.


- Also failures and average numbers of transactions attempts are printed 
per-command with average latencies if you use the appropriate 
benchmarking option (--report-per-command, -r) (it replaces the option 
--report-latencies as I was advised here [1]). Average numbers of 
transactions attempts are printed only for commands which start 
transactions.


As usual: TAP tests for new functionality and changed documentation with 
new examples.


Here are a round of comments on the current version of the patch:

* About the feature

There is a latent issue about what is a transaction. For pgbench a transaction 
is a full script execution.
For postgresql, it is a statement or a BEGIN/END block, several of which may 
appear in a script. From a retry
perspective, you may retry from a SAVEPOINT within a BEGIN/END block... I'm not 
sure how to make general sense
of all this, so this is just a comment without attached action for now.

As the default is not to retry, which is the upward compatible behavior, I 
think that the changes should not
change much the current output bar counting the number of failures.

I would consider using "try/tries" instead of "attempt/attempts" as it is 
shorter. An English native speaker
opinion would be welcome on that point.

* About the code

ISTM that the code interacts significantly with various patches under review or 
ready for committers.
Not sure how to deal with that, there will be some rebasing work...

I'm fine with renaming "is_latencies" to "report_per_command", which is more 
logical & generic.

"max_attempt_number": I'm against typing fields again in their name, aka "hungarian 
naming". I'd suggest
"max_tries" or "max_attempts".

"SimpleStats attempts": I disagree with using this floating poiunt oriented 
structures to count integers.
I would suggest "int64 tries" instead, which should be enough for the 
purpose.


LastBeginState -> RetryState? I'm not sure why this state is a pointer in 
CState. Putting the struct would avoid malloc/free cycles. Index "-1" may 
be used to tell it is not set if necessary.


"CSTATE_RETRY_FAILED_TRANSACTION" -> "CSTATE_RETRY" is simpler and clear enough.

In CState and some code, a failure is a failure, maybe one boolean would 
be enough. It need only be differentiated when counting, and you have 
(deadlock_failure || serialization_failure) everywhere.


Some variables, such as "int attempt_number", should be in the client 
structure, not in the client? Generally, try to use block variables if 
possible to keep the state clearly disjoints. If there could be NO new 
variable at the doCustom level that would be great, because that would 
ensure that there is no machine state mixup hidden in these variables.


I wondering whether the RETRY & FAILURE states could/should be merged:

  on RETRY:
-> count retry
-> actually retry if < max_tries (reset client state, jump to command)
-> else count failure and skip to end of script

The start and end of transaction detection seem expensive (malloc, ...) 
and assume a one statement per command (what about "BEGIN \; ... \; 
COMMIT;", which is not necessarily the case, this limitation should be 
documented. ISTM that the space normalization should be avoided, and 
something simpler/lighter should be devised? Possibly it should consider 
handling SAVEPOINT.


I disagree about exit in ParseScript if the transaction block is not 
completed, especially as it misses out on combined statements/queries 
(BEGIN \; stuff... \; COMMIT") and would break an existing feature.


There are strange characters things in comments, eg "??ontinuous".

Option "max-attempt-number" -> "max-tries"

I would put the client random state initialization with the state 
intialization, not with the connection.


* About tracing

Progress is expected to be short, not detailed. Only add the number of 
failures and retries if max retry is not 1.


* About reporting

I think that too much is reported. I advised to do that, but nevertheless 
it is a little bit steep.


At least, it should not report the number of tries/attempts when the max 
number is one. Simple counting should be reported for failures, not 
floats...


I would suggest a more compact one-line report about failures:

  "number of failures: 12 (0.001%, deadlock: 7, 

Re: [HACKERS] Multi column range partition table

2017-07-12 Thread Ashutosh Bapat
On Wed, Jul 12, 2017 at 12:54 AM, Dean Rasheed  wrote:
> On 11 July 2017 at 13:29, Ashutosh Bapat
>  wrote:
>> + 
>> +  Also note that some element types, such as timestamp,
>> +  have a notion of "infinity", which is just another value that can
>> +  be stored. This is different from MINVALUE and
>> +  MAXVALUE, which are not real values that can be stored,
>> +  but rather they are ways of saying the value is unbounded.
>> +  MAXVALUE can be thought of as being greater than any
>> +  other value, including "infinity" and MINVALUE as being
>> +  less than any other value, including "minus infinity". Thus the range
>> +  FROM ('infinity') TO (MAXVALUE) is not an empty range; it
>> +  allows precisely one value to be stored  the timestamp
>> +  "infinity".
>>   
>>
>> The description in this paragraph seems to be attaching intuitive
>> meaning of word "unbounded" to MAXVALUE and MINVALUE, which have
>> different intuitive meanings of themselves. Not sure if that's how we
>> should describe MAXVALUE/MINVALUE.
>>
>
> I'm not sure I understand your point. MINVALUE and MAXVALUE do mean
> unbounded below and above respectively. This paragraph is just making
> the point that that isn't the same as -/+ infinity.
>

What confuses me and probably users is something named min/max"value"
is not a value but something lesser or greater than any other "value".
The paragraph above explains that FROM ('infinity') TO
(MAXVALUE) implies a partition with only infinity value in there.
What would be the meaning of FROM (MINVALUE) TO ('minus
infinity'), would that be allowed? What would it contain esp. when
the upper bounds are always exclusive?

>
>> Most of the patch seems to be replacing "content" with "kind",
>> RangeDatumContent with PartitionRangeDatumKind  and RANGE_DATUM_FINITE
>> with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem
>> to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF
>> and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and
>> PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with
>> MINVALUE/MAXVALUE change. May be we should reuse the previous
>> variables, enum type name and except those two, so that the total
>> change introduced by the patch is minimal.
>>
>
> No, this isn't just renaming that other enum. It's about replacing the
> boolean "infinite" flag on PartitionRangeDatum with something that can
> properly enumerate the 3 kinds of PartitionRangeDatum that are allowed
> (and, as noted above "finite"/"infinite isn't the right terminology
> either).

Right. I think we need that change.

> Putting that new enum in parsenodes.h makes it globally
> available, wherever the PartitionRangeDatum structure is used. A
> side-effect of that change is that the old RangeDatumContent enum that
> was local to partition.c is no longer needed.

Hmm, I failed to notice the changes in _out, _equal, _read functions.
The downside is that enum can not be used for anything other than
partitioning. But I can not imagine where will we use it though.

>
> RangeDatumContent wouldn't be a good name for a globally visible enum
> of this kind because that name fails to link it to partitioning in any
> way, and could easily be confused as having something to do with RTEs
> or range types. Also, the term "content" is more traditionally used in
> the Postgres sources for a field *holding* content, rather than a
> field specifying the *kind* of content. On the other hand, you'll note
> that the term "kind" is by far the most commonly used term for naming
> this kind of enum, and any matching fields.

Ok.

>
> IMO, code consistency and readability takes precedence over keeping
> patch sizes down.

No doubt about that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor style cleanup in tab-complete.c

2017-07-12 Thread Heikki Linnakangas

On 07/12/2017 07:39 AM, Michael Paquier wrote:

On Wed, Jul 12, 2017 at 11:19 AM, Thomas Munro
 wrote:

From the triviality department: I noticed some branches in
tab-complete.c's gargantuan if statement, mostly brand new, that break
from the established brace style.  Should we fix that like this?


For consistency I think it does.


Agreed. Committed, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-12 Thread Ashutosh Bapat
On Wed, Jul 12, 2017 at 9:39 AM, Amit Langote
 wrote:
> On 2017/07/12 12:47, Ashutosh Bapat wrote:
>> On Wed, Jul 12, 2017 at 8:23 AM, Amit Langote
>>  wrote:
>>> On 2017/07/11 18:57, Ashutosh Bapat wrote:
 On Tue, Jul 11, 2017 at 4:16 AM, David Fetter  wrote:
> So whatever we land on needs to mention partition_of and
> has_partitions.  Is that latter just its immediate partitions?
> Recursion all the way down?  Somewhere in between?
>

 We have patches proposed to address some of those concerns at [1]

 [1] 
 https://www.postgresql.org/message-id/CAFjFpRcs5fOSfaAGAjT5C6=yvdd7mrx3knf_spb5dqzojgj...@mail.gmail.com
>>>
>>> ISTM, David is talking about the "list tables" (bare \d without any
>>> pattern) case.  That is, listing partitioned tables as of type
>>> "partitioned table" instead of "table" as we currently do.  The linked
>>> patch, OTOH, is for "describe table" (\d ) case.
>>
>> Right, the patches don't exactly do what David is suggesting, but
>> those I believe have code to annotate the tables with "has partitions"
>> and also the number of partitions (I guess). Although, that thread has
>> died some time ago, so my memory can be vague.
>>
>> Do you see that those patches can be used in current discussion in any way?
>
> It wouldn't really be a bad idea to put that patch here, because there's
> no special reason for it to be in the CF for PG 11, if we are talking here
> about changing \d command outputs anyway.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-07-12 Thread Dilip Kumar
On Wed, Jul 12, 2017 at 10:55 AM, Amit Kapila  wrote:
> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes  wrote:
>> On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar  wrote:
>>>
>>> So because of this high projection cost the seqpath and parallel path
>>> both have fuzzily same cost but seqpath is winning because it's
>>> parallel safe.
>>
>>
>> I think you are correct.  However, unless parallel_tuple_cost is set very
>> low, apply_projection_to_path never gets called with the Gather path as an
>> argument.  It gets ruled out at some earlier stage, presumably because it
>> assumes the projection step cannot make it win if it is already behind by
>> enough.
>>
>
> I think that is genuine because tuple communication cost is very high.
> If your table is reasonable large then you might want to try by
> increasing parallel workers (Alter Table ... Set (parallel_workers =
> ..))
>
>> So the attached patch improves things, but doesn't go far enough.
>>
>
> It seems to that we need to adjust the cost based on if the below node
> is projection capable.  See attached.

Patch looks good to me.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New partitioning - some feedback

2017-07-12 Thread Amit Langote
On 2017/07/12 13:09, Amit Langote wrote:
> On 2017/07/12 12:47, Ashutosh Bapat wrote:
>> Do you see that those patches can be used in current discussion in any way?
> 
> It wouldn't really be a bad idea to put that patch here, because there's
> no special reason for it to be in the CF for PG 11, if we are talking here
> about changing \d command outputs anyway.

So, here are 4 patches (including the 2 patches that Ashutosh linked to
upthread):

0001: Show relispartition=true relations as "(foreign) partition" and
  RELKIND_PARTITIONED_TABLE relations that are not themselves
  partitions as "partitioned table"

0002: Hide relispartition=true relations (partitions) by default in the
  \d listing (that is, \d without a name pattern); to enable
  displaying partitions, add a modifier '++'

0003: In \d+ partitioned_table output (describe partitioned table showing
  individual partitions), show if the individual partitions are
  partitioned themselves if it actually does have partitions
  currently

0004: In \d+ partitioned_table output, do not skip the portion of the
  output showing information about partitions if there are currently
  no partitions defined; instead show "Number of partitions: 0"


Regarding 0001, while it shows "partition" and "partitioned table" in the
Type column of \d listing, \d name_pattern will still show Table
"schemaname.tablename".  For example:

\d
 List of relations
 Schema | Name  |   Type| Owner
+---+---+---
 public | xyz   | partitioned table | amit
 public | xyz1  | partition | amit
 public | xyz2  | partition | amit
 public | xyz3  | partition | amit
 public | xyz31 | partition | amit
(5 rows)

\d xyz*
Table "public.xyz"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
Partition key: LIST (a)
Number of partitions: 3 (Use \d+ to list them.)

Table "public.xyz1"

Table "public.xyz2"

Table "public.xyz3"

Table "public.xyz31"


...which might seem kind of odd.  Do we want to show xyz1 as "Partition
public.xyz1", for example?

Thanks,
Amit
From ceacc566ab7ac2ffe56a47435a53a12ebafdffe5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 10 Jul 2017 13:25:20 +0900
Subject: [PATCH 1/4] Show partitions and partitioned tables as such in \d
 listing

---
 src/bin/psql/describe.c | 51 -
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index e6833eced5..4613490f56 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3321,27 +3321,60 @@ listTables(const char *tabtypes, const char *pattern, 
bool verbose, bool showSys
printfPQExpBuffer(,
  "SELECT n.nspname as \"%s\",\n"
  "  c.relname as \"%s\",\n"
- "  CASE c.relkind"
- " WHEN " 
CppAsString2(RELKIND_RELATION) " THEN '%s'"
+ "  CASE c.relkind",
+ gettext_noop("Schema"),
+ gettext_noop("Name"));
+
+   /*
+* Starting in PG 10, certain kinds of relations could be partitions, 
which
+* if so, we show Type accordingly.
+*/
+   if (pset.sversion >= 10)
+   appendPQExpBuffer(,
+ " WHEN " 
CppAsString2(RELKIND_RELATION) " THEN"
+ "   CASE c.relispartition"
+ " WHEN 'true' THEN '%s' 
ELSE '%s'"
+ "   END"
+
+ " WHEN " 
CppAsString2(RELKIND_PARTITIONED_TABLE) " THEN"
+ "   CASE c.relispartition"
+ " WHEN 'true' THEN '%s' 
ELSE '%s'"
+ "   END"
+
+ " WHEN " 
CppAsString2(RELKIND_FOREIGN_TABLE) " THEN"
+ "   CASE c.relispartition"
+ " WHEN 'true' THEN '%s' 
ELSE '%s'"
+ "   END",
+ gettext_noop("partition"),
+ gettext_noop("table"),
+
+ gettext_noop("partition"),
+ gettext_noop("partitioned 
table"),
+
+

Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-12 Thread Michael Paquier
On Tue, Jul 11, 2017 at 9:28 AM, Masahiko Sawada  wrote:
> Attached updated version patch. Please review it.

Cool, thanks.

+useless. If the second parameter wait_for_archive is true and
+the backup is taken on a standby, pg_stop_backup waits for WAL
+to archived when archive_mode is always.  Enforcing
+manually a WAL segment swtich to happen with for example
1) "waits for WAL to BE archived".
2) s/swtich/switch

+to false will control the wait time, thought all the
WAL segments
s/thought/though/

/*
 * During recovery, since we don't use the end-of-backup WAL
 * record and don't write the backup history file, the
 * starting WAL location doesn't need to be unique.
This means
 * that two base backups started at the same time
might use
 * the same checkpoint as starting locations.
 */
This comment in xlog.c needs an update. Two base backups started at
the same can generate a backup history file with the same offset, aka
same file name. I don't think that it matters much for this file
honestly. I think that this would become meaningful once such files
play a more important role, in which case having a unique identifier
would be way more interesting, but this day has IMO not come yet.
Other thoughts are welcome.

if (waitforarchive && XLogArchivingActive())
{
+   /* Don't wait if WAL archiving not enabled always in recovery */
+   if (backup_started_in_recovery && !XLogArchivingAlways())
+   return stoppoint;
+
This has the smell of breakage waiting to happen, I think that we
should have just one single return point, which updates as well the
stop TLI at the end of the routine. This can just be a single
condition.

+   if (stoptli_p)
+   *stoptli_p = stoptli;
+
Not sure there is any point to move that around, on top of previous comment.

+errhint("Backup has been taken from a
standby, check if the WAL segment "
+"needed for this backup have been
completed, in which case a "
+"foreced segment switch may can
be needed on the primary. "
+"If a segment swtich has already
happend check that your "
+"archive_command is executing properly."
+"pg_stop_backup can be canceled
safely, but the database backup "
+"will not be usable without all
the WAL segments.")));
Some comments here:
1) s/foreced/forced/
2) s/may can/may/
3) s/swtich/switch/
4) s/happend/happened/
5) "Backup has been taken from a standby" should be a single sentence.

This is beginning to shape.

Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers