from:"Jeff Janes"

Re: BUG #16079: Question Regarding the BUG #16064

2020-12-21 Thread Jeff Janes

On Sun, Dec 20, 2020 at 7:58 PM Stephen Frost  wrote:

>
> * Magnus Hagander (mag...@hagander.net) wrote:
>
>
Changed from bugs to hackers.

> > For the old plaintext "password" method, we log a warning when we parse
> the
> > configuration file.
>

Like Stephen, I don't see such a warning getting logged.

> >
> > Maybe we should do the same for LDAP (and RADIUS)? This seems like a
> better
> > place to put it than to log it at every time it's received?
>
> A dollar short and a year late, but ... +1.

I would suggest going further.  I would make the change on the client side,
and have libpq refuse to send unhashed passwords without having an
environment variable set which allows it.  (Also, change the client
behavior so it defaults to verify-full when PGSSLMODE is not set.)

What is the value of logging on the server side?  I can change the setting
from password to md5 or from ldap to gss, when I notice the log message.
But once compromised or during a MITM attack, the bad guy will just set it
back to the unsafe form and the client will silently go along with it.

Cheers,

Jeff

DETAIL for wrong scram password

2021-02-27 Thread Jeff Janes

When md5 password authentication fails, the server log file has a helpful
detail to say why, usually one of:

DETAIL:  Role "none" does not exist.
DETAIL:  User "jjanes" has no password assigned.
DETAIL:  User "jjanes" has an expired password.
DETAIL:  Password does not match for user "jjanes".

But for scram authentication, only the first three of these will be
reported when applicable.  If the password is simply incorrect, then you do
get a DETAIL line reporting which line of pg_hba was used, but you don't
get a DETAIL line reporting the reason for the failure.  It is pretty
unfriendly to make the admin guess what the absence of a DETAIL is supposed
to mean. And as far as I can tell, this is not intentional.

Note that in one case you do get the "does not match" line.  That is if the
user has a scram password assigned and the hba specifies plain-text
'password' as the method.  So if the absence of the DETAIL is intentional,
it is not internally consistent.

The attached patch fixes the issue.  I don't know if this is the correct
location to be installing the message, maybe verify_client_proof should be
doing it instead.  I am also not happy to be testing 'doomed' here, but it
was already checked a few lines up so I didn't want to go to lengths to
avoid doing it here too.

Cheers,

Jeff


scram_password_mismatch.patch
Description: Binary data

Re: Supporting = operator in gin/gist_trgm_ops

2020-11-15 Thread Jeff Janes

On Sat, Nov 14, 2020 at 12:31 AM Alexander Korotkov 
wrote:

>
> I went through and revised this patch.  I made the documentation
> statement less categorical.  pg_trgm gist/gin indexes might have lower
> performance of equality operator search than B-tree.  So, we can't
> claim the B-tree index is always not needed.  Also, simple comparison
> operators are <, <=, >, >=, and they are not supported.
>

Is "simple comparison" here a well-known term of art?  If I read the doc as
committed (which doesn't include the sentence above), and if I didn't
already know what it was saying, I would be left wondering which
comparisons those are.  Could we just say "inequality operators"?

Cheers,

Jeff

memory leak in auto_explain

2021-02-01 Thread Jeff Janes

I accidentally tried to populate a test case
while auto_explain.log_min_duration was set to
zero.  auto_explain.log_nested_statements was also on.

create or replace function gibberish(int) returns text language SQL as $_$
select left(string_agg(md5(random()::text),),$1) from
generate_series(0,$1/32) $_$;

create table j1 as select x, md5(random()::text) as t11, gibberish(1500) as
t12 from generate_series(1,20e6) f(x);

I got logorrhea of course, but I also got a memory leak into the SQL
function context:

  TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
PortalContext: 16384 total in 5 blocks; 5328 free (1 chunks); 11056
used: 
  ExecutorState: 4810120 total in 13 blocks; 4167160 free (74922
chunks); 642960 used
SQL function: 411058232 total in 60 blocks; 4916568 free (4
chunks); 406141664 used: gibberish

The memory usage grew until OOM killer stepped in.

Cheers,

Jeff

Re: memory leak in auto_explain

2021-02-01 Thread Jeff Janes

On Mon, Feb 1, 2021 at 6:09 PM Jeff Janes  wrote:

>
>
> create or replace function gibberish(int) returns text language SQL as $_$
> select left(string_agg(md5(random()::text),),$1) from
> generate_series(0,$1/32) $_$;
>
> create table j1 as select x, md5(random()::text) as t11, gibberish(1500)
> as t12 from generate_series(1,20e6) f(x);
>

I should have added, found it on HEAD, verified it also in 12.5.

Cheers,

Jeff

>

track_io_timing default setting

2021-12-09 Thread Jeff Janes

Can we change the default setting of track_io_timing to on?

I see a lot of questions, such as over at stackoverflow or
dba.stackexchange.com, where people ask for help with plans that would be
much more useful were this on.  Maybe they just don't know better, maybe
they can't turn it on because they are not a superuser.

I can't imagine a lot of people who care much about its performance impact
will be running the latest version of PostgreSQL on
ancient/weird systems that have slow clock access. (And the few who do can
just turn it off for their system).

For systems with fast user-space clock access, I've never seen this setting
being turned on make a noticeable dent in performance.  Maybe I just never
tested enough in the most adverse scenario (which I guess would be a huge
FS cache, a small shared buffers, and a high CPU count with constant
churning of pages that hit the FS cache but miss shared buffers--not a
system I have handy to do a lot of tests with.)

Cheers,

Jeff

Re: Loaded footgun open_datasync on Windows

2020-07-23 Thread Jeff Janes

On Fri, Sep 14, 2018 at 3:32 AM Michael Paquier  wrote:

> On Fri, Sep 14, 2018 at 08:43:18AM +0200, Laurenz Albe wrote:
>
> > If it turns out not to break anything, would you consider backpatching?
> > On the one hand it fixes a bug, on the other hand it affects all
> > frontend executables...
>
> Yeah, for this reason I would not do a backpatch.  I have a very hard
> time to believe that any frontend tools on Windows developed by anybody
> rely on files to be opened only by a single process, still if they do
> they would be surprised to see a change of behavior after a minor
> update in case they rely on the concurrency limitations.
>

Reviving an old thread here.

Could it be back-patched in some pg_test_fsync specific variant?  I
don't think we should just ignore the fact that pg_test_fsync on Windows is
unfit for its intended purpose on 4 still-supported versions.

> > I wonder why nobody noticed the problem in pg_test_fsync earlier.
> > Is it that people running Windows care less if their storage is
> > reliable?
>
> likely so.
>

I have noticed this before, but since it wasn't a production machine I just
shrugged it off as being a hazard of using consumer-grade stuff; it didn't
seem to be worth investigating further.

 Cheers,

Jeff

Re: estimation problems for DISTINCT ON with FDW

2020-07-25 Thread Jeff Janes

On Fri, Jul 3, 2020 at 5:50 PM Tom Lane  wrote:

>
> OK, I'll go ahead and push the patch I proposed yesterday.
>

Thank you.  I tested 12_STABLE with my real queries on the real data set,
and the "hard coded" estimate of 200 distinct rows (when use_remote_estimte
is turned back on) is enough to get rid of the worst plans I was seeing in
12.3.

Cheers,

Jeff

tab completion of IMPORT FOREIGN SCHEMA

2020-08-09 Thread Jeff Janes

I use IMPORT FOREIGN SCHEMA a bit to set up systems for testing.  But not
enough that I can ever remember whether INTO or FROM SERVER comes first in
the syntax.

Here is an improvement to the tab completion, so I don't have to keep
looking it up in the docs.

It accidentally (even before this patch) completes "IMPORT FOREIGN SCHEMA"
with a list of local schemas.  This is probably wrong, but I find this
convenient as I often do this in a loop-back setup where the list of
foreign schema would be the same as the local ones.  So I don't countermand
that behavior here.

Cheers,

Jeff


foreign_schema_tab_complete.patch
Description: Binary data

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-26 Thread Jeff Janes

On Tue, Aug 25, 2020 at 8:58 AM Amit Kapila  wrote:

>  I am planning
> to push the first patch (v53-0001-Extend-the-BufFile-interface) in
> this series tomorrow unless you have any comments on the same.
>

I'm getting compiler warnings now, src/backend/storage/file/sharedfileset.c
line 288 needs to be:

boolfound PG_USED_FOR_ASSERTS_ONLY = false;

Cheers,

Jeff

Autovac cancellation is broken in v14

2020-08-27 Thread Jeff Janes

If I create a large table with "CREATE TABLE ... AS SELECT ... from
generate_series(1,3e7)" with no explicit transactions, then once it is done
I wait for autovac to kick in, then when I try to build an index on that
table (or drop the table) the autovac doesn't go away on its own.

Bisects down to:

commit 5788e258bb26495fab65ff3aa486268d1c50b123
Author: Andres Freund 
Date:   Wed Jul 15 15:35:07 2020 -0700

snapshot scalability: Move PGXACT->vacuumFlags to
ProcGlobal->vacuumFlags.

Which makes sense given the parts of the code this touches, although I
don't understand exactly what the problem is.  The problem persists in HEAD
(77c7267c37).

Cheers,

Jeff

Re: Autovac cancellation is broken in v14

2020-08-27 Thread Jeff Janes

On Thu, Aug 27, 2020 at 3:10 PM Jeff Janes  wrote:

> If I create a large table with "CREATE TABLE ... AS SELECT ... from
> generate_series(1,3e7)" with no explicit transactions, then once it is done
> I wait for autovac to kick in, then when I try to build an index on that
> table (or drop the table) the autovac doesn't go away on its own.
>

After a bit more poking at this, I think we are checking if we ourselves
are an autovac process, not doing the intended check of whether the other
guy is one.

Where would be a good spot to add a regression test for this?
"isolation_regression" ?
Cheers,

Jeff

fix_autovac_cancel.patch
Description: Binary data

moving aggregate bad error message

2020-09-06 Thread Jeff Janes

I was wondering if I could just add minvfunc, and have the rest of the m*
functions be assumed to be the same as their non-moving counterparts.
Apparently the answer is 'no'.  But in the process, I found a bad error
message.  When omitting mfinalfunc when there is a finalfunc, I get the
error:

"ERROR:  moving-aggregate implementation returns type jj_state, but plain
implementation returns type jj_state."  A rather peculiar
complaint, analogous to the infamous "something failed: Success".

Looking at the code, it seems we are testing rettype != finaltype, but
reporting aggmTransType and aggTransType.  Why aren't we reporting what we
are testing?

With the attached patch, it gives the more sensible "ERROR:
 moving-aggregate implementation returns type jj_state, but plain
implementation returns type numeric."

Cheers,

Jeff
drop type jj_state cascade;

CREATE TYPE jj_state AS ( x numeric, y numeric);


CREATE OR REPLACE FUNCTION jj_transition(
  state jj_state,
  val numeric
) RETURNS jj_state AS $$
BEGIN
  RETURN NULL::jj_state;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION jj_final(
  state jj_state
) RETURNS numeric AS $$
BEGIN
  RETURN 5.5;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION jj_inverse(state jj_state, val numeric)
RETURNS jj_state AS $$
BEGIN
  RETURN NULL::jj_state;
END;
$$ LANGUAGE plpgsql;

CREATE AGGREGATE jj(numeric) (
  sfunc  = jj_transition,
  stype  = jj_state,
  finalfunc  = jj_final,
  initcond   = '(0,0)',
  msfunc = jj_transition,
  minvfunc   = jj_inverse,
  mstype = jj_state
  -- mfinalfunc = average_final
);


moving_agg_error.patch
Description: Binary data

Re: WIP: Covering + unique indexes.

2018-04-08 Thread Jeff Janes

On Sat, Apr 7, 2018 at 4:02 PM, Teodor Sigaev  wrote:

> Thanks to everyone, pushed.
>
>
Indeed thanks, this will be a nice feature.

It is giving me a compiler warning on non-cassert builds using gcc (Ubuntu
5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609:

indextuple.c: In function 'index_truncate_tuple':
indextuple.c:462:6: warning: unused variable 'indnatts' [-Wunused-variable]
  int   indnatts = tupleDescriptor->natts;

Cheers,

Jeff

Default JIT setting in V12

2019-09-04 Thread Jeff Janes

Since JIT is on by default in v12, I wanted to revisit the issue raised in
https://www.postgresql.org/message-id/CAMkU=1zVhQ5k5d=YyHNyrigLUNTkOj4=YB17s9--3ts8H-SO=q...@mail.gmail.com

When the total estimated cost is between jit_above_cost and
jit_optimize_above_cost, I get a substantial regression in the attached.
Note that I did not devise this case specifically to cause this problem, I
just stumbled upon it.

JIT, no optimization: 10.5s
JIT, optimization: 3.8s
no JIT:  4.1s

It seems like the unoptimized JIT code is much worse than the general
purpose code.

This is on AWS c4.large, Ubuntu 18.04, installed from PGDG apt repository.
No config changes were made, other than the local ones included in the
script. (Previously there were questions about how LLVM was configured,
that I couldn't really answer well, but here there should be question as I
didn't compile or configure it at all.)

There were some proposed mitigations in sister threads, but none have been
adopted in v12.

I think it is intuitive, and with empirical evidence, that we do not want
to JIT compile at all unless we are going to optimize the compiled code.

Is there a rationale for, or other examples to show, that it makes sense
for the default value of jit_optimize_above_cost to be 5 fold higher than
the default setting of jit_above_cost?

I think these defaults are setting a trap for our users who aren't really
interested in JIT, and are just upgrading to stay on the most-current
version.  I would propose lowering the default jit_optimize_above_cost to
be the same as jit_above_cost, or set it to 0 so that  jit_above_cost is
always in control and always optimizes.

Cheers,

Jeff
drop table if exists i200c200;
create table i200c200 ( pk bigint primary key, 
int1 bigint,
int2 bigint,
int3 bigint,
int4 bigint,
int5 bigint,
int6 bigint,
int7 bigint,
int8 bigint,
int9 bigint,
int10 bigint,
int11 bigint,
int12 bigint,
int13 bigint,
int14 bigint,
int15 bigint,
int16 bigint,
int17 bigint,
int18 bigint,
int19 bigint,
int20 bigint,
int21 bigint,
int22 bigint,
int23 bigint,
int24 bigint,
int25 bigint,
int26 bigint,
int27 bigint,
int28 bigint,
int29 bigint,
int30 bigint,
int31 bigint,
int32 bigint,
int33 bigint,
int34 bigint,
int35 bigint,
int36 bigint,
int37 bigint,
int38 bigint,
int39 bigint,
int40 bigint,
int41 bigint,
int42 bigint,
int43 bigint,
int44 bigint,
int45 bigint,
int46 bigint,
int47 bigint,
int48 bigint,
int49 bigint,
int50 bigint,
int51 bigint,
int52 bigint,
int53 bigint,
int54 bigint,
int55 bigint,
int56 bigint,
int57 bigint,
int58 bigint,
int59 bigint,
int60 bigint,
int61 bigint,
int62 bigint,
int63 bigint,
int64 bigint,
int65 bigint,
int66 bigint,
int67 bigint,
int68 bigint,
int69 bigint,
int70 bigint,
int71 bigint,
int72 bigint,
int73 bigint,
int74 bigint,
int75 bigint,
int76 bigint,
int77 bigint,
int78 bigint,
int79 bigint,
int80 bigint,
int81 bigint,
int82 bigint,
int83 bigint,
int84 bigint,
int85 bigint,
int86 bigint,
int87 bigint,
int88 bigint,
int89 bigint,
int90 bigint,
int91 bigint,
int92 bigint,
int93 bigint,
int94 bigint,
int95 bigint,
int96 bigint,
int97 bigint,
int98 bigint,
int99 bigint,
int100 bigint,
int101 bigint,
int102 bigint,
int103 bigint,
int104 bigint,
int105 bigint,
int106 bigint,
int107 bigint,
int108 bigint,
int109 bigint,
int110 bigint,
int111 bigint,
int112 bigint,
int113 bigint,
int114 bigint,
int115 bigint,
int116 bigint,
int117 bigint,
int118 bigint,
int119 bigint,
int120 bigint,
int121 bigint,
int122 bigint,
int123 bigint,
int124 bigint,
int125 bigint,
int126 bigint,
int127 bigint,
int128 bigint,
int129 bigint,
int130 bigint,
int131 bigint,
int132 bigint,
int133 bigint,
int134 bigint,
int135 bigint,
int136 bigint,
int137 bigint,
int138 bigint,
int139 bigint,
int140 bigint,
int141 bigint,
int142 bigint,
int143 bigint,
int144 bigint,
int145 bigint,
int146 bigint,
int147 bigint,
int148 bigint,
int149 bigint,
int150 bigint,
int151 bigint,
int152 bigint,
int153 bigint,
int154 bigint,
int155 bigint,
int156 bigint,
int157 bigint,
int158 bigint,
int159 bigint,
int160 bigint,
int161 bigint,
int162 bigint,
int163 bigint,
int164 bigint,
int165 bigint,
int166 bigint,
int167 bigint,
int168 bigint,
int169 bigint,
int170 bigint,
int171 bigint,
int172 bigint,
int173 bigint,
int174 bigint,
int175 bigint,
int176 bigint,
int177 bigint,
int178 bigint,
int179 bigint,
int180 bigint,
int181 bigint,
int182 bigint,
int183 bigint,
int184 bigint,
int185 bigint,
int186 bigint,
int187 bigint,
int188 bigint,
int189 bigint,
int190 bigint,
int191 bigint,
int192 bigint,
int193 bigint,
int194 bigint,
int195 bigint,
int196 bigint,
int197 bigint,
int198 bigint,
int199 bigint,
int200 bigint,
char1 varchar(255),
char2 varchar(255),
char3 varchar(255),
char4 varchar(255),
char5 varchar(255),
char6 varchar(255),
char7 varchar(255),
char8 varchar(255),
char9 varchar(255),
char10 varchar(255),
char11 varchar(255),
char12 varchar(255),
char13 varchar(255),
char14 varchar(255),
char15 varchar(255),
char16 varchar(255),
char17 varchar(255),
char1

log spam with postgres_fdw

2019-09-15 Thread Jeff Janes

I'm sending this to hackers, because it is not exactly a bug, and it can't
be addressed from userland.  I think it is a coding issue, although I
haven't identified the exact code.

When closing the local session which had used postgres_fdw over an ssl
connection, I get log spam on the foreign server saying:

LOG:  could not receive data from client: Connection reset by peer

It is easy to reproduce, but you must be using ssl to do so.

On searching, I see that a lot of people have run into this issue, with
considerable confusion, but as far as I can see it has never been diagnosed.

Is there anything that can be done about this, other than just learning to
ignore it?

Cheers,

Jeff
-- set up ssl, I won't walk through those steps.
\! pgbench -i
create extension postgres_fdw ;
create server ssl foreign data wrapper postgres_fdw OPTIONS ( host '127.0.0.1');
create schema fgn;
create user MAPPING FOR CURRENT_USER SERVER ssl;
import  foreign schema public from server ssl into fgn;
select count(*) from fgn.pgbench_history;
\q

Re: Primary keepalive message not appearing in Logical Streaming Replication

2019-09-15 Thread Jeff Janes

On Sun, Sep 15, 2019 at 11:44 AM Michael Loftis  wrote:

>
>
> On Sun, Sep 15, 2019 at 08:36 Virendra Negi  wrote:
>
>> Oh I miss the documentation link there you go
>> https://www.postgresql.org/docs/9.5/protocol-replication.html
>>
>> On Sun, Sep 15, 2019 at 8:05 PM Virendra Negi 
>> wrote:
>>
>>> Agreed but why is there a message specification for it describe in the
>>> documentation  and it ask to client reply back if a particular *bit* is
>>> set.(1 means that the client should reply to this message as soon as
>>> possible, to avoid a timeout disconnect. 0 otherwise)
>>>
>>
> This is unrelated to TCP keepalive. I honestly don't know where the knob
> is to turn these on but the configuration variables you quoted earlier I am
> familiar with and they are not it. Perhaps someone else can chime in with
> how to enable the protocol level keepalive in replication.
>

Protocol-level keepalives are governed by "wal_sender_timeout"

Cheers,

Jeff

Re: log spam with postgres_fdw

2019-09-15 Thread Jeff Janes

On Sun, Sep 15, 2019 at 11:14 AM Tom Lane  wrote:

> Jeff Janes  writes:
> > When closing the local session which had used postgres_fdw over an ssl
> > connection, I get log spam on the foreign server saying:
> > LOG:  could not receive data from client: Connection reset by peer
> > It is easy to reproduce, but you must be using ssl to do so.
> > On searching, I see that a lot of people have run into this issue, with
> > considerable confusion, but as far as I can see it has never been
> diagnosed.
>
> In
>
> https://www.postgresql.org/message-id/flat/3DPLMQIC.YU6IFMLY.3PLOWL6W%40FQT5M7HS.IFBAANAE.A7GUPCPM
>
>
Thanks, I had not spotted that one, I guess because the log message itself
was not in the subject so it ranked lower.


> we'd concluded that the issue is probably that postgres_fdw has no
> logic to shut down its external connections when the session closes.
> It's not very clear why the SSL dependency, but we speculated that
> adding an on_proc_exit callback to close the connection(s) would help.
>
>
It is easy to reproduce the ssl dependency without any FDW, just by doing a
kill -9 on psql. Apparently the backend process for unencrypted connections
are happy to be ghosted, while ssl ones are not; which seems like an odd
distinction to make.  So should this be addressed on both sides (the server
not whining, and the client doing the on_proc_exit anyway?).  I can take a
stab at the client side one, but I'm over my head on the ssl connection
handling logic on the server side.

Cheers,

Jeff

Re: Default JIT setting in V12

2019-09-16 Thread Jeff Janes

On Wed, Sep 4, 2019 at 11:24 AM Andres Freund  wrote:

> Hi,
>
> On 2019-09-04 07:51:16 -0700, Andres Freund wrote:
>

> Or better, something slightly more complete, like the attached (which
> affects both code-gen time optimizations (which are more like peephole
> ones), and both function/global ones that are cheap).
>

Yes, that does completely solve the issue I raised.  It makes JIT either
better or at least harmless, even when falling into the gap between
jit_above_cost
and jit_optimize_above_cost.

What TPC-H implementation do you use/recommend?  This one
https://wiki.postgresql.org/wiki/DBT-3?

Cheers,

Jeff

WAL recycled despite logical replication slot

2019-09-20 Thread Jeff Janes

While testing something else (whether "terminating walsender process due to
replication timeout" was happening spuriously), I had logical replication
set up streaming a default pgbench transaction load, with the publisher
being 13devel-e1c8743 and subscriber being 12BETA4.  Eventually I started
getting errors about requested wal segments being already removed:

10863 sub idle 0 2019-09-19 17:14:58.140 EDT LOG:  starting logical
decoding for slot "sub"
10863 sub idle 0 2019-09-19 17:14:58.140 EDT DETAIL:  Streaming
transactions committing after 79/EB0B17A0, reading WAL from 79/E70736A0.
10863 sub idle 58P01 2019-09-19 17:14:58.140 EDT ERROR:  requested WAL
segment 0001007900E7 has already been removed
10863 sub idle 0 2019-09-19 17:14:58.144 EDT LOG:  disconnection:
session time: 0:00:00.030 user=jjanes database=jjanes host=10.0.2.2
port=40830

It had been streaming for about 50 minutes before the error showed up, and
it showed right when streaming was restarting after one of the replication
timeouts.

Is there an innocent explanation for this?  I thought logical replication
slots provided an iron-clad guarantee that WAL would be retained until it
was no longer needed.  I am just using pub/sub, none of the lower level
stuff.

Cheers,

Jeff

Re: WAL recycled despite logical replication slot

2019-09-22 Thread Jeff Janes

On Fri, Sep 20, 2019 at 11:27 AM Tomas Vondra 
wrote:

> >
> >Is there an innocent explanation for this?  I thought logical replication
> >slots provided an iron-clad guarantee that WAL would be retained until it
> >was no longer needed.  I am just using pub/sub, none of the lower level
> >stuff.
> >
>
> I think you're right - this should not happen with replication slots.
> Can you provide more detailed setup instructions, so that I can try to
> reproduce and investigate the isssue?
>

It is a bit messy, because this isn't what I was trying to test.

The basic set up is pretty simple:

On master:

pgbench -i -s 100
create publication pgbench for table pgbench_accounts,  pgbench_branches,
pgbench_history , pgbench_tellers;
pgbench -R200 -c4 -j4 -P60 -T36 -n

on replica:

pgbench -i -s 1
truncate pgbench_history , pgbench_accounts, pgbench_branches,
pgbench_tellers;
create subscription sub CONNECTION 'host=192.168.0.15' publication pgbench;

The messy part:  It looked like the synch was never going to finish, so
first I cut the rate down to -R20.  Then what I thought I did was drop the
primary key on pgbench_accounts (manually doing a kill -15 on the synch
worker to release the lock), wait for the copy to start again and then
finish and then start getting "ERROR:  logical replication target relation
"public.pgbench_accounts" has neither REPLICA IDENTITY index nor PRIMARY
KEY and published relation does not have REPLICA IDENTITY FULL" log
messages, then I re-added the primary key.  Then I increased the -R back to
200, and about 50 minutes later got the WAL already removed error.

But now I can't seem to reproduce this, as the next time I tried to do the
synch with no primary key there doesn't seem to be a commit after the COPY
finishes so once it tries to replay the first update, it hits the above "no
primary key" error and then rolls back the **the entire COPY** as well as
the single-row update, and starts the entire COPY over again before you
have a chance to intervene and build the index.  So I'm guessing now that
either the lack of a commit (which itself seems like a spectacularly bad
idea) is situation dependent, or the very slow COPY had finished between
the time I had decided to drop the primary key, and time I actually
implemented the drop.

Perhaps important here is that the replica is rather underpowered.  Write
IO and fsyncs periodically become painfully slow, which is probably why
there are replication timeouts, and since the problem happened when trying
to reestablish after a timeout I would guess that that is critical to the
issue.

I was running the master with fsync=off, but since the OS never crashed
that should not be the source of corruption.

I'll try some more to reproduce this, but I wanted to make sure there was
actually something here to reproduce, and not just my misunderstanding of
how things are supposed to work.

Cheers,

Jeff

Re: WAL recycled despite logical replication slot

2019-09-22 Thread Jeff Janes

On Fri, Sep 20, 2019 at 6:25 PM Andres Freund  wrote:

> Hi,
>
> On September 20, 2019 5:45:34 AM PDT, Jeff Janes 
> wrote:
> >While testing something else (whether "terminating walsender process
> >due to
> >replication timeout" was happening spuriously), I had logical
> >replication
> >set up streaming a default pgbench transaction load, with the publisher
> >being 13devel-e1c8743 and subscriber being 12BETA4.  Eventually I
> >started
> >getting errors about requested wal segments being already removed:
> >
> >10863 sub idle 0 2019-09-19 17:14:58.140 EDT LOG:  starting logical
> >decoding for slot "sub"
> >10863 sub idle 0 2019-09-19 17:14:58.140 EDT DETAIL:  Streaming
> >transactions committing after 79/EB0B17A0, reading WAL from
> >79/E70736A0.
> >10863 sub idle 58P01 2019-09-19 17:14:58.140 EDT ERROR:  requested WAL
> >segment 0001007900E7 has already been removed
> >10863 sub idle 0 2019-09-19 17:14:58.144 EDT LOG:  disconnection:
> >session time: 0:00:00.030 user=jjanes database=jjanes host=10.0.2.2
> >port=40830
> >
> >It had been streaming for about 50 minutes before the error showed up,
> >and
> >it showed right when streaming was restarting after one of the
> >replication
> >timeouts.
> >
> >Is there an innocent explanation for this?  I thought logical
> >replication
> >slots provided an iron-clad guarantee that WAL would be retained until
> >it
> >was no longer needed.  I am just using pub/sub, none of the lower level
> >stuff.
>
> It indeed should. What's the content of
> pg_replication_slot for that slot?
>

Unfortunately I don't think I have that preserved.  If I can reproduce the
issue, would preserving data/pg_replslot/sub/state help as well?

Cheers,

Jeff

JSONPATH documentation

2019-09-22 Thread Jeff Janes

I find the documentation in
https://www.postgresql.org/docs/12/functions-json.html very confusing.

In table 9.44 take the first entry,

Example JSON
 {"x": [2.85, -14.7, -9.4]}

Example Query
  + $.x.floor()

Result
2, -15, -10

There are no end to end examples here. How do I apply the example query to
the example json to obtain the given result?

Table 9.47 only gives two operators which apply a jsonpath to a json(b)
object: @? and @@; and neither one of those yield the indicated result from
the first line in 9.44. What does?

Also, I can't really figure out what the descriptions of @? and @@ mean.
Does @? return true if an item exists, even if the value of that item is
false, while @@ returns the truth value of the existing item?

https://www.postgresql.org/docs/12/datatype-json.html#DATATYPE-JSONPATH

"The SQL/JSON path language is fully integrated into the SQL engine". What
does that mean?  If it were only partially integrated, what would that
mean?  Is this providing me with any useful information?  Is this just
saying that this is not a contrib extension module?

What is the difference between "SQL/JSON Path Operators And Methods" and
and "jsonpath Accessors" and why are they not described in the same place,
or at least nearby each other?

Cheers,

Jeff

Re: JSONPATH documentation

2019-09-22 Thread Jeff Janes

On Sun, Sep 22, 2019 at 2:18 PM Jeff Janes  wrote:

> I find the documentation in
> https://www.postgresql.org/docs/12/functions-json.html very confusing.
>
> In table 9.44 take the first entry,
>
> Example JSON
>  {"x": [2.85, -14.7, -9.4]}
>
> Example Query
>   + $.x.floor()
>
> Result
> 2, -15, -10
>
> There are no end to end examples here. How do I apply the example query to
> the example json to obtain the given result?
>

OK, never mind here.  After digging in the regression tests, I did find
jsonb_path_query and friends, and they are in the docs with examples in
table 9.49.  I don't know how I overlooked that in the first place, I guess
I was fixated on operators.  Or maybe by the time I was down in those
functions, I thought I had cycled back up and was looking at 9.44 again.
But I think it would make sense to move the description of jsonpath to its
own page.  It is confusing to have operators within the jsonpath language,
and operators which apply to jsonpath "from the outside", together in the
same page.

Cheers,

Jeff

>

DROP SUBSCRIPTION with no slot

2019-09-24 Thread Jeff Janes

I recently had to cut loose (pg_drop_replication_slot) a logical replica
that couldn't keep up and so was threatening to bring down the master.

In mopping up on the replica side, I couldn't just drop the subscription,
because it couldn't drop the nonexistent slot on the master and so refused
to work.  So I had to do a silly little dance where I first disable the
subscription, then ALTER SUBSCRIPTION ... SET (slot_name = NONE), then drop
it.

Wanting to clean up after itself is admirable, but if there is nothing to
clean up, why should that be an error condition?  Should this be an item on
https://wiki.postgresql.org/wiki/Todo (to whatever extent that is still
used).

Cheers,

Jeff

Re: DROP SUBSCRIPTION with no slot

2019-09-24 Thread Jeff Janes

On Tue, Sep 24, 2019 at 5:25 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-09-24 16:31, Jeff Janes wrote:
> > I recently had to cut loose (pg_drop_replication_slot) a logical replica
> > that couldn't keep up and so was threatening to bring down the master.
> >
> > In mopping up on the replica side, I couldn't just drop the
> > subscription, because it couldn't drop the nonexistent slot on the
> > master and so refused to work.  So I had to do a silly little dance
> > where I first disable the subscription, then ALTER SUBSCRIPTION ... SET
> > (slot_name = NONE), then drop it.
> >
> > Wanting to clean up after itself is admirable, but if there is nothing
> > to clean up, why should that be an error condition?
>
> The alternatives seem quite error prone to me.  Better be explicit.
>

If you can connect to the master, and see that the slot already fails to
exist, what is error prone about that?

If someone goes to the effort of setting up a different master, configures
it to receive replica connections, and alters the subscription CONNECTION
parameter on the replica to point to that poisoned master, will an error on
the DROP SUBSCRIPTION really force them to see the error of their ways, or
will they just succeed at explicitly doing the silly dance and finalize the
process of shooting themselves in the foot via a roundabout mechanism?
(And at the point the CONNECTION is changed, he is in the same boat even if
he doesn't try to drop the sub--either way the master has a dangling
slot).  I'm in favor of protecting a fool from his foolishness, except when
it is annoying to the rest of us (Well, annoying to me, I guess. I don't
know yet about the rest of us).

Cheers,

Jeff

Re: DROP SUBSCRIPTION with no slot

2019-09-24 Thread Jeff Janes

On Tue, Sep 24, 2019 at 6:42 PM Ziga  wrote:

> This also seems to be a problem for somewhat fringe case of subscriptions
> created with connect=false option.
> They cannot be dropped in an obvious way, without knowing the ALTER
> SUBSCRIPTION trick.
>
> For example:
>
> contrib_regression=# create subscription test_sub connection
> 'dbname=contrib_regression' publication test_pub with ( connect=false );
>
>
> WARNING:  tables were not subscribed, you will have to run ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
> CREATE SUBSCRIPTION
>
> contrib_regression=# drop subscription test_sub; -- fails
> ERROR:  could not drop the replication slot "test_sub" on publisher
> DETAIL:  The error was: ERROR:  replication slot "test_sub" does not exist
>
> contrib_regression=# alter subscription test_sub set ( slot_name=none );
> ALTER SUBSCRIPTION
>
> contrib_regression=# drop subscription test_sub; -- succeeds
> DROP SUBSCRIPTION
>
>
> Note that the publication was never refreshed.
> It seems that the first DROP should succeed in the above case.
> Or at least some hint should be given how to fix this.
>

There is no HINT in the error message itself, but there is in the
documentation, see note at end of
https://www.postgresql.org/docs/current/sql-dropsubscription.html.  I agree
with you that the DROP should just work in this case, even more so than in
my case.  But if we go with the argument that doing that is too error
prone, then do we want to include a HINT on how to be error prone more
conveniently?

Cheers,

Jeff

Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-09-26 Thread Jeff Janes

On Wed, Sep 11, 2019 at 3:52 PM Alvaro Herrera 
wrote:

>
> Reading over this code, I noticed that the detection of the catch-up
> state ends up being duplicate code, so I would rework that function as
> in the attached patch.
>
> The naming of those flags (got_SIGUSR2, got_STOPPING) is terrible, but
> I'm not going to change that in a backpatchable bug fix.
>

Hi Alvaro, does this count as a review?  And Craig, do you agree with
Alvaro's patch as a replacement for your own?

Thanks,

Jeff

logical replication empty transactions

2019-10-21 Thread Jeff Janes

After setting up logical replication of a slowly changing table using the
built in pub/sub facility, I noticed way more network traffic than made
sense.  Looking into I see that every transaction in that database on the
master gets sent to the replica.  99.999+% of them are empty transactions
('B' message and 'C' message with nothing in between) because the
transactions don't touch any tables in the publication, only non-replicated
tables.  Is doing it this way necessary for some reason?  Couldn't we hold
the transmission of 'B' until something else comes along, and then if that
next thing is 'C' drop both of them?

There is a comment for WalSndPrepareWrite which seems to foreshadow such a
thing, but I don't really see how to use it in this case.  I want to drop
two messages, not one.

 * Don't do anything lasting in here, it's quite possible that nothing will
be done
 * with the data.

This applies to all version which have support for pub/sub, including the
recent commits to 13dev.

I've searched through the voluminous mailing list threads for when this
feature was being presented to see if it was already discussed, but since
every word I can think to search on occurs in virtually every message in
the threads in some context or another, I didn't have much luck.

Cheers,

Jeff

Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Jeff Janes

On Sun, Feb 24, 2019 at 11:09 AM Jeff Janes  wrote:

> I've moved this to the hackers list, and added Teodor and Alexander of the
> bloom extension, as I would like to hear their opinions on the costing.
>

My previous patch had accidentally included a couple lines of a different
thing I was working on (memory leak, now-committed), so this patch removes
that diff.

I'm adding it to the commitfest targeting v13.  I'm more interested in
feedback on the conceptual issues rather than stylistic ones, as I would
probably merge the two functions together before proposing something to
actually be committed.

Should we be trying to estimate the false positive rate and charging
cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
recheck and reject those?  I don't think other index types do that, and I'm
inclined to think the burden should be on the user not to create silly
indexes that produce an overwhelming number of false positives.

Cheers,

Jeff

bloom_cost_v3.patch
Description: Binary data

Re: Index Skip Scan

2019-02-28 Thread Jeff Janes

On Wed, Feb 20, 2019 at 11:33 AM Dmitry Dolgov <9erthali...@gmail.com>
wrote:

> > On Fri, Feb 1, 2019 at 8:24 PM Jesper Pedersen <
> jesper.peder...@redhat.com> wrote:
> >
> > Dmitry and I will look at this and take it into account for the next
> > version.
>
> In the meantime, just to not forget, I'm going to post another version
> with a
> fix for cursor fetch backwards, which was crashing before.

This version of the patch can return the wrong answer.

create index on pgbench_accounts (bid, aid);
begin; declare c  cursor for select distinct on (bid) bid, aid from
pgbench_accounts order by bid, aid;
fetch 2 from c;
 bid |   aid
-+-
   1 |   1
   2 | 100,001

fetch backward 1 from c;
 bid |   aid
-+-
   1 | 100,000

Without the patch, instead of getting a wrong answer, I get an error:

ERROR:  cursor can only scan forward
HINT:  Declare it with SCROLL option to enable backward scan.

If I add "SCROLL", then I do get the right answer with the patch.

Cheers,

Jeff

Re: Index Skip Scan

2019-02-28 Thread Jeff Janes

On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello.
>
> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com>
> wrote in  aa+fz3guncutf52q1sufb7ise37tjpsd...@mail.gmail.com>
> > A bit of adjustment after nodes/relation -> nodes/pathnodes.
>
> I had a look on this.
>
> The name "index skip scan" is a different feature from the
> feature with the name on other prodcuts, which means "index scan
> with postfix key (of mainly of multi column key) that scans
> ignoring the prefixing part" As Thomas suggested I'd suggest that
> we call it "index hop scan". (I can accept Hopscotch, either:p)
>

I think that what we have proposed here is just an incomplete
implementation of what other products call a skip scan, not a fundamentally
different thing.  They don't ignore the prefix part, they use that part in
a way to cancel itself out to give the same answer, but faster.  I think
they would also use this skip method to get distinct values if that is what
is requested.  But they would go beyond that to also use it to do something
similar to the plan we get with this:

Set up:
pgbench -i -s50
create index on pgbench_accounts (bid, aid);
alter table pgbench_accounts drop constraint pgbench_accounts_pkey ;

Query:
explain analyze with t as (select distinct bid from pgbench_accounts )
  select pgbench_accounts.* from pgbench_accounts join t using (bid) where
aid=5;

If we accept this patch, I hope it would be expanded in the future to give
similar performance as the above query does even when the query is written
in its more natural way of:

explain analyze select * from pgbench_accounts where aid=5;

(which currently takes 200ms, rather than the 0.9ms taken for the one
benefiting from skip scan)

I don't think we should give it a different name, just because our initial
implementation is incomplete.

Or do you think our implementation of one feature does not really get us
closer to implementing the other?

Cheers,

Jeff

Re: Should we increase the default vacuum_cost_limit?

2019-03-08 Thread Jeff Janes

On Wed, Mar 6, 2019 at 2:54 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
> On 3/6/19 1:38 PM, Jeremy Schneider wrote:
> > On 3/5/19 14:14, Andrew Dunstan wrote:
> >> This patch is tiny, seems perfectly reasonable, and has plenty of
> >> support. I'm going to commit it shortly unless there are last minute
> >> objections.
> > +1
> >
>
> done.
>

Now that this is done, the default value is only 5x below the hard-coded
maximum of 10,000.

This seems a bit odd, and not very future-proof.  Especially since the
hard-coded maximum appears to have no logic to it anyway, at least none
that is documented.  Is it just mindless nannyism?

Any reason not to increase by at least a factor of 10, but preferably the
largest value that does not cause computational problems (which I think
would be INT_MAX)?

Cheers,

Jeff

Hash index initial size is too large given NULLs or partial indexes

2019-03-08 Thread Jeff Janes

Referring to this thread:

https://dba.stackexchange.com/questions/231647/why-are-partial-postgresql-hash-indices-not-smaller-than-full-indices

When a hash index is created on a populated table, it estimates the number
of buckets to start out with based on the number of tuples returned
by estimate_rel_size.  But this number ignores both the fact that NULLs are
not stored in hash indexes, and that partial indexes exist.  This can lead
to much too large hash indexes.  Doing a re-index just repeats the logic,
so doesn't fix anything.  Fill factor also can't fix it, as you are not
allowed to increase that beyond 100.

This goes back to when the pre-sizing was implemented in 2008
(c9a1cc694abef737548a2a).  It seems to be an oversight, rather than
something that was considered.

Is this a bug that should be fixed?  Or if getting a more accurate estimate
is not possible or not worthwhile, add a code comment about that?

Cheers,

Jeff

Re: GiST VACUUM

2019-03-15 Thread Jeff Janes

On Tue, Mar 5, 2019 at 8:21 AM Heikki Linnakangas  wrote:

> On 05/03/2019 02:26, Andrey Borodin wrote:
> >> I also tried your amcheck tool with this. It did not report any
> >> errors.
> >>
> >> Attached is also latest version of the patch itself. It is the
> >> same as your latest patch v19, except for some tiny comment
> >> kibitzing. I'll mark this as Ready for Committer in the commitfest
> >> app, and will try to commit it in the next couple of days.
> >
> > That's cool! I'll work on 2nd step of these patchset to make
> > blockset data structure prettier and less hacky.
>
> Committed the first patch. Thanks for the patch!
>

Thank you.  This is a transformational change; it will allow GiST indexes
larger than RAM to be used in some cases where they were simply not
feasible to use before.  On a HDD, it resulted in a 50 fold improvement in
vacuum time, and the machine went from unusably unresponsive to merely
sluggish during the vacuum.  On a SSD (albeit a very cheap laptop one, and
exposed from Windows host to Ubuntu over VM Virtual Box) it is still a 30
fold improvement, from a far faster baseline.  Even on an AWS instance with
a "GP2" SSD volume, which normally shows little benefit from sequential
reads, I get a 3 fold speed up.

I also ran this through a lot of crash-recovery testing using simulated
torn-page writes using my traditional testing harness with high concurrency
(AWS c4.4xlarge and a1.4xlarge using 32 concurrent update processes) and
did not encounter any problems.  I tested both with btree_gist on a scalar
int, and on tsvector with each tsvector having 101 tokens.

I did notice that the space freed up in the index by vacuum doesn't seem to
get re-used very efficiently, but that is an ancestral problem independent
of this change.

Cheers,

Jeff

Re: jsonpath

2019-03-16 Thread Jeff Janes

On Sat, Mar 16, 2019 at 5:36 AM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

>
> So, pushed.  Many thanks to reviewers and authors!
>

I think these files have to be cleaned up by "make maintainer-clean"

./src/backend/utils/adt/jsonpath_gram.c
./src/backend/utils/adt/jsonpath_scan.c

Cheers,

Jeff

compiler warning in pgcrypto imath.c

2019-03-22 Thread Jeff Janes

When compiling on an AWS 64 bit Arm machine, I get this compiler warning:

imath.c: In function 's_ksqr':
imath.c:2590:6: warning: variable 'carry' set but not used
[-Wunused-but-set-variable]
  carry;
  ^

With this version():

PostgreSQL 12devel on aarch64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit

The attached patch adds PG_USED_FOR_ASSERTS_ONLY to silence it.  Perhaps
there is a better way, given that we want to change imath.c as little as
possible from its upstream?

Cheers,

Jeff


pgcrypto_warning.patch
Description: Binary data

Re: pg_upgrade: Pass -j down to vacuumdb

2019-03-27 Thread Jeff Janes

On Tue, Mar 26, 2019 at 7:28 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-25 22:57, Tom Lane wrote:
> > + fprintf(script, "echo %sYou may wish to add --jobs=N for parallel
> analyzing.%s\n",
> > + ECHO_QUOTE, ECHO_QUOTE);
>
> But then you get that information after you have already started the
> script.
>

True, but the same goes for all the other information there, and it sleeps
to let you break out of it.  And I make it a habit to glance through any
scripts someone suggests that I run, so would notice the embedded advice
without running it at all.

I don't find any information about this analyze business on the
> pg_upgrade reference page.  Maybe a discussion there could explain the
> different paths better than making the output script extra complicated.
>
> Essentially: If you want a slow and gentle analyze, use the supplied
> script.  If you want a fast analyze, use vacuumdb, perhaps with an
> appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
> --jobs are resource-bound in different ways, so the same value might not
> be appropriate for both.
>
>
To me, analyze-in-stages is not about gentleness at all.  For example, it
does nothing to move vacuum_cost_delay away from its default of 0.  Rather,
it is about slamming the bare minimum statistics in there as fast as
possible, so your database doesn't keel over from horrible query plans on
even simple queries as soon as you reopen.  I want the database to survive
long enough for the more complete statistics to be gathered.  If you have
quickly accumulated max_connection processes all running horrible query
plans that never finish, your database might as well still be closed for
all the good it does the users. And all the load generated by those is
going to make the critical ANALYZE all that much slower.

At first blush I thought it was obvious that you would not want to run
analyze-in-stages in parallel.  But after thinking about it some more and
reflecting on experience doing some troublesome upgrades, I would reverse
that and say it is now obvious you do want at least the first stage of
analyze-in-stages, and probably the first two, to run in parallel.  That is
not currently an option it supports, so we can't really recommend it in the
script or the docs.

But we could at least adopt the more straightforward patch, suggest that if
they don't want analyze-in-stages they should consider doing the big-bang
analyze in parallel.

Cheers,

Jeff

Re: [HACKERS] generated columns

2019-03-31 Thread Jeff Janes

On Sat, Mar 30, 2019 at 4:03 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-26 20:50, Pavel Stehule wrote:
> > It is great feature and I'll mark this feature as ready for commit
>
> Committed, thanks.
>

I can't do a same-major-version pg_upgrade across this commit, as the new
pg_dump is trying to dump a nonexistent attgenerated column from the old
database.  Is that acceptable collateral damage?  I thought we usually
avoided that breakage within the dev branch, but I don't know how we do it.

Cheers,

Jeff

pg_basebackup delays closing of stdout

2019-07-23 Thread Jeff Janes

Ever since pg_basebackup was created, it had a comment like this:

 * End of chunk. If requested, and this is the base tablespace
 * write configuration file into the tarfile. When done, close the
 * file (but not stdout).

But, why make the exception for output going to stdout?  If we are done
with it, why not close it?

After a massive maintenance operation, I want to re-seed a streaming
standby, which I start to do by:

pg_basebackup -D - -Ft -P -X none  | pxz > base.tar.xz

But the archiver is way behind, so when it finishes the basebackup part, I
get:

NOTICE:  pg_stop_backup cleanup done, waiting for required WAL segments to
be archived
WARNING:  pg_stop_backup still waiting for all required WAL segments to be
archived (60 seconds elapsed)
...

The base backup file is not finalized, because pg_basebackup has not closed
its stdout while waiting for the WAL segment to be archived. The file is
incomplete due to data stuck in buffers, so I can't copy it to where I want
and bring up a new streaming replica (which bypasses the WAL archive, so
would otherwise work). Also, if pg_basebackup gets interupted somehow while
it is waiting for WAL archiving, the backup will be invalid, as it won't
flush the last bit of data.  Of course if it gets interupted, I would have
to test the backup to make sure it is valid.  But testing it and finding
that it is valid is better than testing it and finding that it is not.

I think it makes sense for pg_basebackup to wait for the WAL to be
archived, but there is no reason for it to hold the base.tar.xz file
hostage while it does so.

If I simply remove the test for strcmp(basedir, "-"), as in the attached, I
get the behavior I desire, and nothing bad seems to happen.  Meaning, "make
check -C src/bin/pg_basebackup/" still passes (but only tested on Linux).

Is there a reason not to do this?

Cheers,

Jeff


pg_basebackup_close_stdout.patch
Description: Binary data

Re: Can't we give better table bloat stats easily?

2019-08-26 Thread Jeff Janes

On Fri, Aug 16, 2019 at 8:39 PM Greg Stark  wrote:

> Everywhere I've worked I've seen people struggle with table bloat. It's
> hard to even measure how much of it you have or where, let alone actually
> fix it.
>
> If you search online you'll find dozens of different queries estimating
> how much empty space are in your tables and indexes based on pg_stats
> statistics and suppositions about header lengths and padding and plugging
> them into formulas of varying credibility.
>

There is not much we can do to suppress bad advice that people post on
their own blogs.  If wiki.postgresql.org is hosting bad advice, by all
means we should fix that.

> But isn't this all just silliiness these days? We could actually sum up
> the space recorded in the fsm and get a much more trustworthy number in
> milliseconds.
>

If you have bloat problems, then you probably have vacuuming problems.  If
you have vacuuming problems, how much can you trust fsm anyway?

Cheers,

Jeff

Re: TODO list (was Re: Contributing with code)

2018-01-03 Thread Jeff Janes

On Tue, Jan 2, 2018 at 2:48 PM, Robert Haas  wrote:

> On Sun, Dec 31, 2017 at 2:02 PM, David G. Johnston
>  wrote:
> > It probably needs three sub-sections.  Fist the raw ideas put forth by
> > people not capable of implementation but needing capabilities; these get
> > moved to one of two sections: ideas that have gotten some attention by
> core
> > that have merit but don't have development interest presently; and one
> like
> > this that have gotten the some attention and that core doesn't feel
> would be
> > worth maintaining even if someone was willing to develop it.  We already
> > have this in practice but maybe a bit more formality would help.
> >
> > I'm not seeing that having it, even if incorrect, does harm.
>
> It causes people to waste time developing features we don't want.
>

We don't want them at all, or we just don't want a naive implementation of
them?

If we don't want them at all, then surely we should remove those items.  Or
move them to the bottom of the page, where there is a section just for such
things.  That way people can at least see that it has been considered and
rejected.  And for things that we do want, it is nice to have links to the
emails of where it was discussed/attempted before.  This is especially
useful because searching the archives is very inefficient due to nearly
every word you want to search on being a stop word or a very common word
with many meanings.  It is much easier to find it on the todo list if it is
there than to search the archives.

>
> It also has a note at the top saying we think it's complete, but we
> don't think that, or I don't think it anyway.
>

Yeah, I don't care for that, either.

Cheers,

Jeff

Re: TODO list (was Re: Contributing with code)

2018-01-03 Thread Jeff Janes

On Tue, Jan 2, 2018 at 6:42 PM, Joshua D. Drake 
wrote:

> On 01/02/2018 11:17 AM, Robert Haas wrote:
>
>> On Sun, Dec 31, 2017 at 2:31 PM, Peter Geoghegan  wrote:
>>
>>> On Sun, Dec 31, 2017 at 10:42 AM, Tom Lane  wrote:
>>>
 If we're not going to maintain/curate it properly, I agree it's not
 worth keeping it around.  But I'd rather see somebody put some effort
 into it ...

>>> If somebody was going to resolve to put some effort into maintaining
>>> it to a high standard then it probably would have happened already.
>>> The fact that it hasn't happened tells us plenty.
>>>
>> +1, and well said.
>>
>
> O.k. what does it tell us though? Is it a resource issue? Is it a barrier
> of entry issue?

Lack of ownership/ruthlessness.  While I can edit it to remove items that
don't seem desirable (or comprehensible, or whatever) I'm not likely to do
so, unless I'm the one who added it in the first place. Maybe it made more
sense or was more important to someone else, like the person who added it.
At one time many of the items didn't have links to the relevant email
discussions (or more detailed wiki pages of their own), so those would have
been good targets for purging but I think Bruce hunted down and added links
for most of them.

Another problem is that wikimedia doesn't have a "git blame" like feature.
I've been frustrated before trying to figure out who added an item and
when, so I could research it a bit more.

> What does deleting it solve? What problems (and there is a very large
> obvious one) are caused by deleting it?
>
> Right now, the TODO list is the "only" portal to "potential" things we
> "might" want. If we delete it we are just creating yet another barrier of
> entry to potential contribution. I think we need to consider an alternative
> solution because of that.
>

There are various "roadmaps" floating around (wiki and elsewhere), but they
aren't very prominent or easy to find.  They seem to mostly be minutes from
meetings, but you wouldn't know to look for them if you weren't at the
meeting.

Cheers,

Jeff

Re: Speeding up pg_upgrade

2018-01-05 Thread Jeff Janes

On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby  wrote:

> On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
> >
> > There are clusters that take a long time to dump the schema from the old
> > cluster
>
> Maybe it isn't representative of a typical case, but I can offer a data
> point:
>
> For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
> (for which ~90% is on a ZFS tablespace with compression).  We have what's
> traditionally considered to be an excessive number of child tables, which
> works
> okay since planning time is unimportant to us for the report queries which
> hit
> them.  Some of the tables are wide (historically up to 1600 columns).
> Some of
> those have default values on nearly every column, and pg_attrdef was large
> (>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).
> Since
> something similar is run by pg_upgrade, I worked around the issue for now
> by
> dropping defaults on the historic children in advance of upgrades (at some
> point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's
> not
> the first time we've seen an issue with larger number of children*columns.
>

This is probably worth fixing independent of other ways of speeding up
pg_upgrade.

It spends most of its time making the column names unique while de-parsing
the DEFAULT clause for each column.  But I don't think it ever outputs the
column name which results from that deparsing, and since there is only one
table involved, the names should already be unique anyway, unless I am
missing something.

The time seems to be quadratic in number of columns if all columns have
defaults, or proportional to the product of number of columns in table and
the number of columns with defaults.

The CREATE TABLE has a similar problem upon restoring the dump.

Cheers,

Jeff

pg_dump_default.sh
Description: Bourne shell script

Re: [HACKERS] Removing useless DISTINCT clauses

2018-01-05 Thread Jeff Janes

On Mon, Nov 6, 2017 at 1:16 AM, David Rowley 
wrote:

> In [1] we made a change to process the GROUP BY clause to remove any
> group by items that are functionally dependent on some other GROUP BY
> items.
>
> This really just checks if a table's PK columns are entirely present
> in the GROUP BY clause and removes anything else belonging to that
> table.
>
> All this seems to work well, but I totally failed to consider that the
> exact same thing applies to DISTINCT too.
>
> Over in [2], Rui Liu mentions that the planner could do a better job
> for his case.
>
> Using Rui Liu's example:
>
> CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text);
> INSERT into test_tbl select generate_series(1,1000), 'test';
>
> Master:
>
> postgres=# explain analyze verbose select distinct col, k from
> test_tbl order by k limit 1000;
>QUERY
> PLAN
> 
> 
> -
>  Limit  (cost=1658556.19..1658563.69 rows=1000 width=9) (actual
> time=8934.962..8935.495 rows=1000 loops=1)
>Output: col, k
>->  Unique  (cost=1658556.19..1733557.50 rows=1175 width=9)
> (actual time=8934.961..8935.460 rows=1000 loops=1)
>  Output: col, k
>  ->  Sort  (cost=1658556.19..1683556.63 rows=1175 width=9)
> (actual time=8934.959..8935.149 rows=1000 loops=1)
>Output: col, k
>Sort Key: test_tbl.k, test_tbl.col
>Sort Method: external merge  Disk: 215128kB
>->  Seq Scan on public.test_tbl  (cost=0.00..154056.75
> rows=1175 width=9) (actual time=0.062..1901.728 rows=1000
> loops=1)
>  Output: col, k
>  Planning time: 0.092 ms
>  Execution time: 8958.687 ms
> (12 rows)
>
> Patched:
>
> postgres=# explain analyze verbose select distinct col, k from
> test_tbl order by k limit 1000;
>
>  QUERY PLAN
> 
> 
> --
>  Limit  (cost=0.44..34.31 rows=1000 width=9) (actual time=0.030..0.895
> rows=1000 loops=1)
>Output: col, k
>->  Unique  (cost=0.44..338745.50 rows=1175 width=9) (actual
> time=0.029..0.814 rows=1000 loops=1)
>  Output: col, k
>  ->  Index Scan using test_tbl_pkey on public.test_tbl
> (cost=0.44..313745.06 rows=1175 width=9) (actual time=0.026..0.452
> rows=1000 loops=1)
>Output: col, k
>  Planning time: 0.152 ms
>  Execution time: 0.985 ms
> (8 rows)
>
> A patch to implement this is attached.
>
>
Couldn't the Unique node be removed entirely?  If k is a primary key, you
can't have duplicates in need of removal.

Or would that be a subject for a different patch?

I think remove_functionally_dependant_groupclauses should have a more
generic name, like remove_functionally_dependant_clauses.

Cheers,

Jeff

Re: Why standby restores some WALs many times from archive?

2018-01-10 Thread Jeff Janes

On Sat, Dec 30, 2017 at 4:20 AM, Michael Paquier 
wrote:

> On Sat, Dec 30, 2017 at 04:30:07AM +0300, Sergey Burladyan wrote:
> > We use this scripts:
> > https://github.com/avito-tech/dba-utils/tree/master/pg_archive
> >
> > But I can reproduce problem with simple cp & mv:
> > archive_command:
> >   test ! -f /var/lib/postgresql/wals/%f && \
> >   test ! -f /var/lib/postgresql/wals/%f.tmp && \
> >   cp %p /var/lib/postgresql/wals/%f.tmp && \
> >   mv /var/lib/postgresql/wals/%f.tmp /var/lib/postgresql/wals/%f
>
> This is unsafe. PostgreSQL expects the WAL segment archived to be
> flushed to disk once the archive command has returned its result to the
> backend. Don't be surprised if you get corrupted instances or that you
> are not able to recover up to a consistent point if you need to roll in
> a backup. Note that the documentation of PostgreSQL provides a simple
> example of archive command, which is itself bad enough not to use.
>

True, but that but doesn't explain the current situation, as it reproduces
without an OS level crash so a missing sync would not be relevant. (and on
some systems, mv'ing a file will force it to be synced under some
conditions, so it might be safe anyway)

I thought I'd seen something recently in the mail lists or commit log about
an off-by-one error which causes it to re-fetch the previous file rather
than the current file if the previous file ends with just the right type of
record and amount of padding.  But now I can't find it.

Cheers,

Jeff

Re: Possible performance regression with pg_dump of a large number of relations

2018-01-12 Thread Jeff Janes

On Thu, Jan 11, 2018 at 5:26 PM, Luke Cowell  wrote:

> I've been troubleshooting an issue with slow pg_dump times on postgres
> 9.6.6. I believe something changed between 9.5.10 and 9.6.6 that has made
> dumps significantly slower for databases with a large number of relations.
> I posted this in irc and someone suggested that I should post this here.
> I'm sorry if this isn't the right place.
>
> To simulate the issue I generated 150,000 relations spread across 1000
> schemas (this roughly reflects my production setup).
>
> ```ruby
> File.write "many_relations.sql", (15 / 150).times.flat_map {|n|
>   [
>"create schema s_#{n};",
>150.times.map do |t|
>  "create table s_#{n}.test_#{t} (id int);"
>end
>]
> }.join("\n")
> ```
>
> I have 2 identical pieces of hardware. I've installed 9.5 on one and 9.6
> on the other. I've run the same generated piece of sql in a fresh database
> on both systems.
>
> On my 9.5.10 system:
> > time pg_dump -n s_10 testing > /dev/null
> real0m5.492s
> user0m1.424s
> sys 0m0.184s
>
> On my 9.6.6 system:
> > time pg_dump -n s_10 testing > /dev/null
> real0m27.342s
> user0m1.748s
> sys 0m0.248s
>

I don't get quite as large a regression as you do, from 6s to 19s.  It
looks like there are multiple of them, but the biggest is caused by:

commit 5d589993cad212f7d556d52cc1e42fe18f65b057
Author: Stephen Frost 
Date:   Fri May 6 14:06:50 2016 -0400

pg_dump performance and other fixes

That commit covered a few different things, and I don't what improvement it
mentions is the one that motivated this, but the key change was to add this
query:

EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
= pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
at.attnum>0 and ((SELECT count(acl) FROM (SELECT
unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT
unnest(coalesce(pip.initprivs,acldefault('c',c.relowner as foo) >1 OR
(SELECT count(acl) FROM (SELECT
unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT
SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner as foo)
>0))AS changed_acl

Considering it runs 2 subqueries for every column (including the 6 hidden
system columns) of every table, even ones that don't end up getting dumped
out, it is no wonder it is slow.

If you were just dumping the database with 150,000 objects, I wouldn't
worry about a 20 second regression.  But I assume you intend to loop over
every schema and dump each individually?

Cheers,

Jeff

Re: Possible performance regression with pg_dump of a large number of relations

2018-01-12 Thread Jeff Janes

On Fri, Jan 12, 2018 at 8:01 AM, Jeff Janes  wrote:

> That commit covered a few different things, and I don't what improvement
> it mentions is the one that motivated this, but the key change was to add
> this query:
>
> EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
> = pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
> 'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
> at.attnum>0 and ((SELECT count(acl) FROM (SELECT 
> unnest(coalesce(at.attacl,acldefault('c',c.relowner)))
> AS acl EXCEPT SELECT 
> unnest(coalesce(pip.initprivs,acldefault('c',c.relowner
> as foo) >1 OR (SELECT count(acl) FROM (SELECT 
> unnest(coalesce(pip.initprivs,acldefault('c',c.relowner)))
> AS acl EXCEPT SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner
> as foo) >0))AS changed_acl
>

Sorry, that query reflects some munging I did to it.  The real part added
to the query is:

EXISTS (SELECT 1 FROM pg_attribute at LEFT JOIN pg_init_privs pip ON(c.oid
= pip.objoid AND pip.classoid = (SELECT oid FROM pg_class WHERE relname =
'pg_class') AND pip.objsubid = at.attnum)WHERE at.attrelid = c.oid AND
((SELECT array_agg(acl) FROM (SELECT
unnest(coalesce(at.attacl,acldefault('c',c.relowner))) AS acl EXCEPT SELECT
unnest(coalesce(pip.initprivs,acldefault('c',c.relowner as foo) IS NOT
NULL OR (SELECT array_agg(acl) FROM (SELECT
unnest(coalesce(pip.initprivs,acldefault('c',c.relowner))) AS acl EXCEPT
SELECT unnest(coalesce(at.attacl,acldefault('c',c.relowner as foo) IS
NOT NULL OR NULL IS NOT NULL OR NULL IS NOT NULL))AS changed_ac

Cheers,

Jeff

Logical replication wal sender timestamp bug

2019-11-02 Thread Jeff Janes

While monitoring pg_stat_subscription, I noticed that last_msg_send_time
was usually NULL, which doesn't make sense.  Why would we have a message,
but not know when it was sent?

Looking in src/backend/replication/walsender.c, there is this:

/* output previously gathered data in a CopyData packet */
pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);

/*
 * Fill the send timestamp last, so that it is taken as late as
possible.
 * This is somewhat ugly, but the protocol is set as it's already used
for
 * several releases by streaming physical replication.
 */
resetStringInfo(&tmpbuf);
now = GetCurrentTimestamp();
pq_sendint64(&tmpbuf, now);
memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
   tmpbuf.data, sizeof(int64));

Filling out the timestamp after the message has already been sent is taking
"as late as possible" a little too far.  It results in every message having
a zero timestamp, other than keep-alives which go through a different path.

Re-ordering the code blocks as in the attached seems to fix it.  But I have
to wonder, if this has been broken from the start and no one ever noticed,
is this even valuable information to relay in the first place?  We could
just take the column out of the view, and not bother calling
GetCurrentTimestamp() for each message.

Cheers,

Jeff


walsender_timestamp_fix.patch
Description: Binary data

Re: cost based vacuum (parallel)

2019-11-04 Thread Jeff Janes

On Mon, Nov 4, 2019 at 1:54 AM Amit Kapila  wrote:

> For parallel vacuum [1], we were discussing what is the best way to
> divide the cost among parallel workers but we didn't get many inputs
> apart from people who are very actively involved in patch development.
> I feel that we need some more inputs before we finalize anything, so
> starting a new thread.
>

Maybe a I just don't have experience in the type of system that parallel
vacuum is needed for, but if there is any meaningful IO throttling which is
active, then what is the point of doing the vacuum in parallel in the first
place?

Cheers,

Jeff

Re: Logical replication wal sender timestamp bug

2019-11-08 Thread Jeff Janes

On Wed, Nov 6, 2019 at 2:15 AM Michael Paquier  wrote:

> On Tue, Nov 05, 2019 at 01:19:37PM +0900, Michael Paquier wrote:
> > On Sat, Nov 02, 2019 at 09:54:54PM -0400, Jeff Janes wrote:
> >> Filling out the timestamp after the message has already been sent is
> taking
> >> "as late as possible" a little too far.  It results in every message
> having
> >> a zero timestamp, other than keep-alives which go through a different
> path.
> >
> > It seems to me that you are right: the timestamp is computed too
> > late.
>
> It is easy enough to reproduce the problem by setting for example
> logical replication between two nodes and pgbench to produce some
> load and then monitor pg_stat_subscription periodically.  However, it
> is a problem since logical decoding has been introduced (5a991ef) so
> committed your fix down to 9.4.
>

Thanks.  This column looks much more reasonable now.

Cheers,

Jeff

Re: logical replication empty transactions

2019-11-09 Thread Jeff Janes

On Fri, Nov 8, 2019 at 8:59 PM Euler Taveira  wrote:

> Em seg., 21 de out. de 2019 às 21:20, Jeff Janes
>  escreveu:
> >
> > After setting up logical replication of a slowly changing table using
> the built in pub/sub facility, I noticed way more network traffic than made
> sense.  Looking into I see that every transaction in that database on the
> master gets sent to the replica.  99.999+% of them are empty transactions
> ('B' message and 'C' message with nothing in between) because the
> transactions don't touch any tables in the publication, only non-replicated
> tables.  Is doing it this way necessary for some reason?  Couldn't we hold
> the transmission of 'B' until something else comes along, and then if that
> next thing is 'C' drop both of them?
> >
> That is not optimal. Those empty transactions is a waste of bandwidth.
> We can suppress them if no changes will be sent. test_decoding
> implements "skip empty transaction" as you described above and I did
> something similar to it. Patch is attached.
>

Thanks.  I didn't think it would be that simple, because I thought we would
need some way to fake an acknowledgement for any dropped empty
transactions, to keep the LSN advancing and allow WAL to get recycled on
the master.  But it turns out the opposite.  While your patch drops the
network traffic by a lot, there is still a lot of traffic.  Now it is
keep-alives, rather than 'B' and 'C'.  I don't know why I am getting a few
hundred keep alives every second when the timeouts are at their defaults,
but it is better than several thousand 'B' and 'C'.

My setup here was just to create, publish, and subscribe to a inactive
dummy table, while having pgbench running on the master (with unpublished
tables).  I have not created an intentionally slow network, but I am
testing it over wifi, which is inherently kind of slow.

Cheers,

Jeff

Coding in WalSndWaitForWal

2019-11-09 Thread Jeff Janes

in src/backend/replication/walsender.c, there is the section quoted below.
It looks like nothing interesting happens between the GetFlushRecPtr just
before the loop starts, and the one inside the loop the first time through
the loop.  If we want to avoid doing  CHECK_FOR_INTERRUPTS(); etc.
needlessly, then we should check the result of  GetFlushRecPtr and return
early if it is sufficiently advanced--before entering the loop.  If we
don't care, then what is the point of updating it twice with no meaningful
action in between?  We could just get rid of the section just before the
loop starts.  The current coding seems confusing, and increases traffic on
a potentially busy spin lock.



/* Get a more recent flush pointer. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr();
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

for (;;)
{
longsleeptime;

/* Clear any already-pending wakeups */
ResetLatch(MyLatch);

CHECK_FOR_INTERRUPTS();

/* Process any requests or signals received recently */
if (ConfigReloadPending)
{
ConfigReloadPending = false;
ProcessConfigFile(PGC_SIGHUP);
SyncRepInitConfig();
}

/* Check for input from the client */
ProcessRepliesIfAny();

/*
 * If we're shutting down, trigger pending WAL to be written out,
 * otherwise we'd possibly end up waiting for WAL that never gets
 * written, because walwriter has shut down already.
 */
if (got_STOPPING)
XLogBackgroundFlush();

/* Update our idea of the currently flushed position. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr();
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

Cheers,

Jeff

Re: WAL archive is lost

2019-11-23 Thread Jeff Janes

On Fri, Nov 22, 2019 at 8:04 AM matsumura@fujitsu.com <
matsumura@fujitsu.com> wrote:

> Hi all
>
> I find a situation that WAL archive file is lost but any WAL segment file
> is not lost.
> It causes for archive recovery to fail. Is this behavior a bug?
>
> example:
>
>   WAL segment files
>   00010001
>   00010002
>   00010003
>
>   Archive files
>   00010001
>   00010003
>
>   Archive file 00010002 is lost but WAL segment files
>   is continuous. Recovery with archive (i.e. PITR) stops at the end of
>   00010001.
>

Will it not archive  00010002 eventually, like at the
conclusion of the next restartpoint?  or does it get recycled/removed
without ever being archived?  Or does it just hang out forever in pg_wal?



> How to reproduce:
> - Set up replication (primary and standby).
> - Set [archive_mode = always] in standby.
> - WAL receiver exits (i.e. because primary goes down)
>   after receiver inserts the last record in some WAL segment file
>   before receiver notifies the segement file to archiver(create .ready
> file).
>

Do you have a trick for reliably achieving this last step?

Cheers,

Jeff

disable only nonparallel seq scan.

2019-12-08 Thread Jeff Janes

Is there a way to force a meaningful parallel seq scan, or at least the
planning of one, when the planner wants a non-parallel one?

Usually I can do things like with with enable_* setting, but if I `set
enable_seqscan to off`, it penalizes the parallel seq scan 8 times harder
than it penalizes the non-parallel one, so the plan does not switch.

If I set `force_parallel_mode TO on` then I do get a parallel plan, but it
is a degenerate one which tells me nothing I want to know.

If I `set parallel_tuple_cost = 0` (or in some cases to a negative number),
I can force it switch, but that destroys the purpose, which is to see what
the "would have been" plan estimates are for the parallel seq scan under
the default setting of the cost parameters.

I can creep parallel_tuple_cost downward until it switches, and then try to
extrapolate back up, but this tedious and not very reliable.

Cheers,

Jeff

Re: Index corruption / planner issue with one table in my pg 11.6 instance

2019-12-09 Thread Jeff Janes

On Mon, Dec 9, 2019 at 1:00 PM Jeremy Finzel  wrote:

> I have a table with about 7 million records.  I had a query in which I
> needed 2 indexes added, one for a created timestamp field another for an id
> field; both very high cardinality.
>
> First I noticed the query would not use the timestamp index no matter what
> session config settings I used.  I finally created a temp table copy of the
> table and verified index is used.  Then I rebuilt the main table with
> VACUUM FULL and this caused the index to be used.
>

Were they built with CONCURRENTLY?  Do you have any long-open snapshots?

Cheers,

Jeff

>

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Jeff Janes

On Mon, Dec 9, 2019 at 5:10 PM Jens-Wolfhard Schicke-Uffmann <
drahf...@gmx.de> wrote:

> Hi,
>
> today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> cores) lock contention on a buffer content lock due to taking of a
> SHARED lock (I think):
>

What version of PostgreSQL are you using?

Cheers,

Jeff

Re: disable only nonparallel seq scan.

2019-12-13 Thread Jeff Janes

On Tue, Dec 10, 2019 at 1:32 PM Robert Haas  wrote:

> On Sun, Dec 8, 2019 at 1:24 PM Jeff Janes  wrote:
> > Is there a way to force a meaningful parallel seq scan, or at least the
> planning of one, when the planner wants a non-parallel one?
> >
> > Usually I can do things like with with enable_* setting, but if I `set
> enable_seqscan to off`, it penalizes the parallel seq scan 8 times harder
> than it penalizes the non-parallel one, so the plan does not switch.
> >
> > If I set `force_parallel_mode TO on` then I do get a parallel plan, but
> it is a degenerate one which tells me nothing I want to know.
> >
> > If I `set parallel_tuple_cost = 0` (or in some cases to a negative
> number), I can force it switch, but that destroys the purpose, which is to
> see what the "would have been" plan estimates are for the parallel seq scan
> under the default setting of the cost parameters.
> >
> > I can creep parallel_tuple_cost downward until it switches, and then try
> to extrapolate back up, but this tedious and not very reliable.
>
> I don't think there's a way to force this, but setting both
> parallel_setup_cost and parallel_tuple_cost to 0 seems to often be
> enough.
>

Yes, that is fine if I want the actual execution results.  And I patch
guc.c to allow negative settings, for when some extra persuasion is needed.

But here I want to see what the planner is thinking, and changing the *cost
settings changes that thinking.  So I want to force the planner to choose
the "next-best" plan under the original cost settings so I can see how far
away they are from each other.  I made a crude patch to add
enable_singleseqscan, which has been letting me get at this information now.

I'm not proposing to apply this particular patch to the code base, but I do
wonder if we can do something about this "dark spot" which no combination
of current enable_* setting seems to be able to get at.

Cheers,

Jeff

enable_singleseqscan.patch
Description: Binary data

psql's \watch is broken

2019-12-13 Thread Jeff Janes

If I do something like this:

explain (analyze) select * from pgbench_accounts \watch 1

It behaves as expected.  But once I break out of the loop with ctrl-C, then
if I execute the same thing again it executes the command once, but shows
no output and doesn't loop.  It seems like some flag is getting set with
ctrl-C, but then never gets reset.

It was broken in this commit:

commit a4fd3aa719e8f97299dfcf1a8f79b3017e2b8d8b
Author: Michael Paquier 
Date:   Mon Dec 2 11:18:56 2019 +0900

Refactor query cancellation code into src/fe_utils/


I've not dug into code itself, I just bisected it.

Cheers,

Jeff

Re: psql's \watch is broken

2019-12-13 Thread Jeff Janes

On Fri, Dec 13, 2019 at 9:45 PM Michael Paquier  wrote:

> On Sat, Dec 14, 2019 at 12:09:51AM +0100, Fabien COELHO wrote:
> >
> >> explain (analyze) select * from pgbench_accounts \watch 1
> >>
> >> It behaves as expected.  But once I break out of the loop with ctrl-C,
> then
> >> if I execute the same thing again it executes the command once, but
> shows
> >> no output and doesn't loop.  It seems like some flag is getting set with
> >> ctrl-C, but then never gets reset.
> >>
> >>
> >> I've not dug into code itself, I just bisected it.
> >
> > Thanks for the report. I'll look into it.
>
> Looked at it already.   And yes, I can see the difference.  This comes
> from the switch from cancel_pressed to CancelRequested in psql,
> especially PSQLexecWatch() in this case.  And actually, now that I
> look at it, I think that we should simply get rid of cancel_pressed in
> psql completely and replace it with CancelRequested.  This also
> removes the need of having cancel_pressed defined in print.c, which
> was not really wanted originally.  Attached is a patch which addresses
> the issue for me, and cleans up the code while on it.  Fabien, Jeff,
> can you confirm please?
> --
> Michael
>

This works for me.

Thanks,

Jeff

Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2019-12-17 Thread Jeff Janes

On Fri, Sep 7, 2018 at 9:17 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 05/09/2018 18:46, Peter Eisentraut wrote:
> > On 01/09/2018 06:33, Shinoda, Noriyoshi (PN Japan GCS Delivery) wrote:
> >> Certainly the PQconndefaults function specifies Debug flag for the
> "options" option.
> >> I think that eliminating the Debug flag is the simplest solution.
> >> For attached patches, GUC can be specified with the following syntax.
> >>
> >> CREATE SERVER remsvr1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
> 'host 1', ..., options '-c work_mem=64MB -c geqo=off');
> >> ALTER SERVER remsv11 OPTIONS (SET options '-c work_mem=64MB -c
> geqo=off');
> >>
> >> However, I am afraid of the effect that this patch will change the
> behavior of official API PQconndefaults.
> >
> > It doesn't change the behavior of the API, it just changes the result of
> > the API function, which is legitimate and the reason we have the API
> > function in the first place.
> >
> > I think this patch is fine.  I'll work on committing it.
>
> I have committed just the libpq change.  The documentation change was
> redundant, because the documentation already stated that all libpq
> options are accepted.  (Arguably, the documentation was wrong before.)
>

Since the effect of this commit was to make postgres_fdw actually comply
with its documentation,
should this have been back-patched?  Is there a danger in backpatching this
change to libpq to older versions?

This seems like more of a bug fix than an added feature.

Cheers,

Jeff

Re: [PATCH] Increase the maximum value track_activity_query_size

2019-12-29 Thread Jeff Janes

On Tue, Dec 24, 2019 at 12:11 AM Robert Haas  wrote:

> On Sat, Dec 21, 2019 at 1:25 PM Tom Lane  wrote:
> > > What is the overhead here except the memory consumption?
> >
> > The time to copy those strings out of shared storage, any time
> > you query pg_stat_activity.
>
> It seems like you're masterminding this, and I don't know why. It
> seems unlikely that anyone will raise the value unless they have very
> long queries, and if those people would rather pay the overhead of
> copying more data than have their queries truncated, who are we to
> argue?
>

+1

Cheers,

Jeff

Re: vacuum verbose detail logs are unclear (show debug lines at start of each stage?)

2019-12-29 Thread Jeff Janes

On Fri, Dec 20, 2019 at 12:11 PM Justin Pryzby  wrote:

> This is a usability complaint.  If one knows enough about vacuum and/or
> logging, I'm sure there's no issue.
>

> | 11  DEBUG:  "t": found 999 removable, 999 nonremovable row versions in 9
> out of 9 pages
>

I agree the mixture of pre-action and after-action reporting is rather
confusing sometimes.  I'm more concerned about what the user sees in their
terminal, though, rather than the server's log file.

Also, the above quoted line is confusing.  It makes it sound like it found
removable items, but didn't actually remove them.  I think that that is
taking grammatical parallelism too far.  How about something like:

DEBUG:  "t": removed 999 row versions, found 999 nonremovable row versions
in 9 out of 9 pages

Also, I'd appreciate a report on how many hint-bits were set, and how many
pages were marked all-visible and/or frozen.  When I do a manual vacuum, it
is more often for those purposes than it is for removing removable rows
(which autovac generally does a good enough job of).

Also, is not so clear that "nonremovable rows" includes both live and
recently dead.  Although hopefully reading the next line will clarify that,
to the person who has enough background knowledge.

> | 12  DETAIL:  0 dead row versions cannot be removed yet, oldest xmin:
> 130886944
> | 13  There were 0 unused item identifiers.
> | 14  Skipped 0 pages due to buffer pins, 0 frozen pages.
>

It is a bit weird that we don't report skipped all-visible pages here.  It
was implicitly reported in the "in 9 out of 9 pages" message, but I think
it should be reported explicitly as well.

Cheers,

Jeff

Re: [PATCH] Increase the maximum value track_activity_query_size

2020-01-02 Thread Jeff Janes

On Mon, Dec 30, 2019 at 6:46 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

> On Tue, Dec 31, 2019 at 9:38 AM Tom Lane  wrote:
>
> >
> > This doesn't seem like a reason not to allow a higher limit, like a
> > megabyte or so, but I'm not sure that pushing it to the moon would be
> > wise.
> >
>
>
> Just to get a mental handle on the size of queries we might be
> allowing before truncation, I did some very rough arithmetic on what
> well known texts might fit in a megabyte. By my calculations you could
> fit about four Animal Farms or one Madame Bovary in about a megabyte.
> So I think that seems like more than enough :-). (My mind kinda
> explores at the thought of debugging a query as long as Animal Farm.)
>
>
I've seen some pretty big IN-lists and VALUES lists.   They aren't so hard
to debug once you tune out iterations 3 through N-3 of the list members.
Unless they are hard to debug for other reasons.  In these cases, it would
be helpful, if not just allowing bigger texts in general, to instead
"truncate" from the middle, preserving both the beginning and the end of
the query text.

Cheers,

Jeff

Re: color by default

2020-01-03 Thread Jeff Janes

On Tue, Dec 31, 2019 at 8:35 AM Tom Lane  wrote:

> Peter Eisentraut  writes:
> > With the attached patch, I propose to enable the colored output by
> > default in PG13.
>
> FWIW, I shall be setting NO_COLOR permanently if this gets committed.
> I wonder how many people there are who actually *like* colored output?
> I find it to be invariably less readable than plain B&W text.
>
>
I find color massively useful for grep and its variants, where the hit can
show up anywhere on the line.  It was also kind of useful for git,
especially "git grep", but on my current system git's colorizing seems
hopelessly borked, so I had to turn it off.

But I turned PG_COLOR on and played with many commands, and must say I
don't really see much of a point.  When most of these command fail, they
only generate a few lines of output, and it isn't hard to spot the error
message.  When pg_restore goes wrong, you get a lot of messages but
colorizing them isn't really helpful.  I don't need 'error' to show up in
red in order to know that I have a lot of errors, especially since the
lines which do report errors always have 'error' as the 2nd word on the
line, where it isn't hard to spot.  If it could distinguish the important
errors from unimportant errors, that would be more helpful.  But if it
could reliably do that, why print the unimportant ones at all?

It doesn't seem like this is useful enough to have it on by default, and
without it being on by default there is no point in having NO_COLOR to turn
if off.  There is something to be said for going with the flow, but the
"emerging standard" seems like it has quite a bit further to emerge before
I think that would be an important reason.

Cheers,

Jeff

Why is pq_begintypsend so slow?

2020-01-11 Thread Jeff Janes

I was using COPY recently and was wondering why BINARY format is not much
(if any) faster than the default format.  Once I switched from mostly
exporting ints to mostly exporting double precisions (7e6 rows of 100
columns, randomly generated), it was faster, but not by as much as I
intuitively thought it should be.

Running 'perf top' to profile a "COPY BINARY .. TO '/dev/null'" on a AWS
m5.large machine running Ubuntu 18.04, with self compiled PostgreSQL:

PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit

I saw that the hotspot was pq_begintypsend at 20%, which was twice the
percentage as the next place winner (AllocSetAlloc).  If I drill down into
teh function, I see something like the below.  I don't really speak
assembly, but usually when I see an assembly instruction being especially
hot and not being the inner most instruction in a loop, I blame it on CPU
cache misses.  But everything being touched here should already be well
cached, since  initStringInfo has just got done setting it up. And if not
for that, then the by the 2nd invocation of appendStringInfoCharMacro it
certainly should be in the cache, yet that one is even slower than the 1st
appendStringInfoCharMacro.

Why is this such a bottleneck?

pq_begintypsend  /usr/local/pgsql/bin/postgres

 0.15 |push  %rbx
 0.09 |mov  %rdi,%rbx
   |   initStringInfo(buf);
 3.03 |callq initStringInfo
   |   /* Reserve four bytes for the bytea length word */
   |   appendStringInfoCharMacro(buf, '\0');
   |movslq 0x8(%rbx),%rax
 1.05 |lea  0x1(%rax),%edx
 0.72 |cmp  0xc(%rbx),%edx
   |jge  b0
 2.92 |mov  (%rbx),%rdx
   |movb  $0x0,(%rdx,%rax,1)
13.76 |mov  0x8(%rbx),%eax
 0.81 |mov  (%rbx),%rdx
 0.52 |add  $0x1,%eax
 0.12 |mov  %eax,0x8(%rbx)
 2.85 |cltq
 0.01 |movb  $0x0,(%rdx,%rax,1)
   |   appendStringInfoCharMacro(buf, '\0');
10.65 |movslq 0x8(%rbx),%rax
   |lea  0x1(%rax),%edx
 0.90 |cmp  0xc(%rbx),%edx
   |jge  ca
 0.54 | 42:  mov  (%rbx),%rdx
 1.84 |movb  $0x0,(%rdx,%rax,1)
13.88 |mov  0x8(%rbx),%eax
 0.03 |mov  (%rbx),%rdx
   |add  $0x1,%eax
 0.33 |mov  %eax,0x8(%rbx)
 2.60 |cltq
 0.06 |movb  $0x0,(%rdx,%rax,1)
   |   appendStringInfoCharMacro(buf, '\0');
 3.21 |movslq 0x8(%rbx),%rax
 0.23 |lea  0x1(%rax),%edx
 1.74 |cmp  0xc(%rbx),%edx
   |jge  e0
 0.21 | 67:  mov  (%rbx),%rdx
 1.18 |movb  $0x0,(%rdx,%rax,1)
 9.29 |mov  0x8(%rbx),%eax
 0.18 |mov  (%rbx),%rdx
   |add  $0x1,%eax
 0.19 |mov  %eax,0x8(%rbx)
 3.14 |cltq
 0.12 |movb  $0x0,(%rdx,%rax,1)
   |   appendStringInfoCharMacro(buf, '\0');
 5.29 |movslq 0x8(%rbx),%rax
 0.03 |lea  0x1(%rax),%edx
 1.45 |cmp  0xc(%rbx),%edx
   |jge  f6
 0.41 | 8c:  mov  (%rbx),%rdx

Cheers,

Jeff

Re: pg_upgrade: Pass -j down to vacuumdb

2019-04-03 Thread Jeff Janes

On Fri, Mar 29, 2019 at 5:58 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-28 02:43, Jeff Janes wrote:
> > At first blush I thought it was obvious that you would not want to run
> > analyze-in-stages in parallel.  But after thinking about it some more
> > and reflecting on experience doing some troublesome upgrades, I would
> > reverse that and say it is now obvious you do want at least the first
> > stage of analyze-in-stages, and probably the first two, to run in
> > parallel.  That is not currently an option it supports, so we can't
> > really recommend it in the script or the docs.
>
> So do you think we should copy down the -j option from pg_upgrade, or
> make some other arrangement?
>

For 12, I think we should not pass it down so that it runs automatically,
and just go with a doc-only patch instead.  Where the text written into
the analyze_new_cluster.sh recommending to hit Ctrl-C and do something else
is counted as documentation.

I agree with you that the value of -j that was used for pg_ugrade might not
be suitable for vacuumdb, but anyone who tests things thoroughly enough to
know exactly what value to use for each is not going to be the target
audience of analyze_new_cluster.sh anyway.  So I think the -j used in
pg_upgrade can just be written directly into the suggestion, and that would
be good enough for the intended use.

Ideally I think the analyze-in-stages should have the ability to
parallelize the first stage and not the last one, but that is not v12
material.

Even more ideally it should only have two stages, not three.  One stage
runs to generate minimally-tolerable stats before the database is opened to
other users, and one runs after it is open (but hopefully before the big
end-of-month reports get run, or whatever the big statistics-sensitive
queries are on your system).   But we don't really have a concept of "open
to other users" in the first place, and doing it yourself by juggling
pg_hba files is annoying and error prone.  So maybe the first stage could
be run by pg_upgrade itself, while the new server is still running on a
linux socket in a private directory.

The defense of the current three-stage method for analyze-in-stages is that
very few people are likely to know just what the level of
"minimally-tolerable stats" for their system actually are, because upgrades
are rare enough not to learn by experience, and realistic load-generators
are rare enough not to learn from test/QA environments.  If the
default_statistics_target=1 stats are enough, then you got them quickly.
If they aren't, at least you didn't waste too much time collecting them
before moving on to the second-stage default_statistics_target=10 and then
the final stage.  So the three stages are amortizing over our ignorance.

Cheers,

Jeff

Re: Should the docs have a warning about pg_stat_reset()?

2019-04-14 Thread Jeff Janes

On Wed, Apr 10, 2019 at 2:52 PM Bruce Momjian  wrote:

>
> OK, let me step back.  Why are people resetting the statistics
> regularly?  Based on that purpose, does it make sense to clear the
> stats that effect autovacuum?
>

When I've done it (not regularly, thankfully), it was usually because I
failed to type "pg_stat_reset_shared('bgwriter')" or
"pg_stat_statements_reset()" correctly.

I've also been tempted to do it because storing snapshots with a cron job
or something requires effort and planning ahead to set up the tables and
cron and some way to limit the retention, and than running LAG windows
functions over the snapshots requires a re-study of the documentation,
because they are a bit esoteric and I don't use them enough to commit the
syntax to memory.  I don't want to see pg_statio_user_indexes often enough
to make elaborate arrangements ahead of time (especially since
track_io_timing columns is missing from it); but when I do want them, I
want them.  And when I do, I don't want them to be "since the beginning of
time".

When I'm thinking about pg_statio_user_indexes, I am probably not thinking
about autovac, since they have about nothing in common with each other.
(Other than pg_stat_reset operating on both.)

Cheers,

Jeff

Re: TRACE_SORT defined by default

2019-04-25 Thread Jeff Janes

On Wed, Apr 24, 2019 at 6:04 PM Tom Lane  wrote:

> Peter Geoghegan  writes:
>
> > In
> > any case the current status quo is that it's built by default. I have
> > used it in production, though not very often. It's easy to turn it on
> > and off.
>
> Would any non-wizard really have a use for it?
>

I've had people use it to get some insight into the operation and memory
usage of Aggregate nodes, since those nodes offer nothing useful via
EXPLAIN ANALYZE.  It would be a shame to lose that ability on
package-installed PostgreSQL unless we fix Aggregate node reporting first.

Cheers,

Jeff

pg_upgrade --clone error checking

2019-05-01 Thread Jeff Janes

With the new pg_upgrade --clone, if we are going to end up throwing the
error "file cloning not supported on this platform" (which seems to depend
only on ifdefs) I think we should throw it first thing, before any other
checks are done and certainly before pg_dump gets run.

This might result in some small amount of code duplication, but I think it
would be worth the cost.

For cases where we might throw "could not clone file between old and new
data directories", I wonder if we shouldn't do some kind of dummy copy to
catch that error earlier, as well.  Maybe that one is not worth it.

Cheers,

Jeff

Re: pg_upgrade --clone error checking

2019-05-02 Thread Jeff Janes

On Thu, May 2, 2019 at 11:57 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-05-01 22:10, Jeff Janes wrote:
> > With the new pg_upgrade --clone, if we are going to end up throwing the
> > error "file cloning not supported on this platform" (which seems to
> > depend only on ifdefs) I think we should throw it first thing, before
> > any other checks are done and certainly before pg_dump gets run.
>
> Could you explain in more detail what command you are running, what
> messages you are getting, and what you would like to see instead?
>

I'm running:

pg_upgrade --clone -b /home/jjanes/pgsql/REL9_6_12/bin/ -B
/home/jjanes/pgsql/origin_jit/bin/ -d /home/jjanes/pgsql/data_96/ -D
/home/jjanes/pgsql/data_clone/

And I get:

Performing Consistency Checks
-
Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch   ok
Checking for tables WITH OIDS   ok
Checking for invalid "unknown" user columns ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok

file cloning not supported on this platform
Failure, exiting

I think the error message wording is OK, I think it should be thrown
earlier, before the "Creating dump of database schemas" (which can take a
long time), and preferably before either database is even started.  So
ideally it would be something like:


Performing Consistency Checks
-
Checking cluster versions
Checking file cloning support
File cloning not supported on this platform
Failure, exiting


When something is doomed to fail, we should report the failure as early as
feasibly detectable.

Cheers,

Jeff

Re: pg_upgrade --clone error checking

2019-05-02 Thread Jeff Janes

On Thu, May 2, 2019 at 12:28 PM Alvaro Herrera 
wrote:

> On 2019-May-02, Jeff Janes wrote:
>
> >
> > When something is doomed to fail, we should report the failure as early
> as
> > feasibly detectable.
>
> I agree -- this check should be done before checking the database
> contents.  Maybe even before "Checking cluster versions".
>

It looks like it was designed for early checking, it just wasn't placed
early enough.  So changing it is pretty easy, as check_file_clone does not
need to be invented, and there is no additional code duplication over what
was already there.

This patch moves the checking to near the beginning.

It carries the --link mode checking along with it.  That should be done as
well, and doing it as a separate patch would make both patches uglier.

Cheers,

Jeff

pg_upgrade_filesystem.patch
Description: Binary data

Re: pg_upgrade --clone error checking

2019-05-03 Thread Jeff Janes

On Fri, May 3, 2019 at 3:53 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-05-02 20:03, Jeff Janes wrote:
> > It looks like it was designed for early checking, it just wasn't placed
> > early enough.  So changing it is pretty easy, as check_file_clone does
> > not need to be invented, and there is no additional code duplication
> > over what was already there.
> >
> > This patch moves the checking to near the beginning.
>
> I think the reason it was ordered that way is that it wants to do all
> the checks of the old cluster before doing any checks touching the new
> cluster.
>

But is there a reason to want to do that?  I understand we don't want to
keep starting and stopping the clusters needlessly, so we should do
everything we can in one before moving to the other.  But for checks that
don't need a running cluster, why would it matter?  The existence and
contents of PG_VERSION of the new cluster directory is already checked at
the very beginning (and even tries to start it up and shuts it down again
if a pid file also exists), so there is precedence for touching the new
cluster directory at the filesystem level early (albeit in a readonly
manner) and if a pid file exists then doing even more than that.  I didn't
move check_file_clone to before the liveness check is done, out of a
abundance of caution.  But creating a transient file with a name of no
significance ("PG_VERSION.clonetest") in a cluster that is not even running
seems like a very low risk thing to do.   The pay off is that we get an
inevitable error message much sooner.

Cheers,

Jeff

make maintainer-clean and config.cache

2019-05-04 Thread Jeff Janes

In side-note in another thread Tom pointed out the speed improvements of
using an autoconf cache when re-building, which sounded nice to me as
config takes an annoyingly long time and is not parallelized.

But the config.cache files gets deleted by make maintainer-clean.  Doesn't
that mostly defeat the purpose of having a cache?  Am I doing something
wrong here, or just thinking about it wrong?

time ./configure --config-cache > /dev/null
real 0m21.538s

time ./configure --config-cache > /dev/null
real 0m3.425s

make maintainer-clean > /dev/null ;
## presumably git checkout a new commit here
time ./configure --config-cache > /dev/null
real 0m21.260s

Cheers,

Jeff

Re: compiler warning in pgcrypto imath.c

2019-05-04 Thread Jeff Janes

On Sat, May 4, 2019 at 3:15 AM Noah Misch  wrote:

>
> I pushed Jeff's patch.
>

Thank you.  I've re-tested it and I get warning-free compilation now.

Cheers,

Jeff

Re: Usage of epoch in txid_current

2019-05-04 Thread Jeff Janes

On Thu, Mar 28, 2019 at 1:30 AM Thomas Munro  wrote:

> On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas 
> wrote:
> > Once we have the FullTransactionId type and basic macros in place, I'm
> > sure we could tidy up a bunch of code by using them.

Thanks for the reviews!  Pushed.
>

I think that this might be broken.

We have this change:

@@ -73,7 +75,8 @@ GetNewTransactionId(bool isSubXact)

LWLockAcquire(XidGenLock, LW_EXCLUSIVE);

-   xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+   full_xid = ShmemVariableCache->nextFullXid;
+   xid = XidFromFullTransactionId(full_xid);

But then later on in an little-used code path around line 164:

/* Re-acquire lock and start over */
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);

full_xid does not get updated, but then later on full_xid gets returned in
lieu of xid.

Is there a reason that this is OK?

Cheers,

Jeff

Re: Usage of epoch in txid_current

2019-05-04 Thread Jeff Janes

On Sat, May 4, 2019 at 1:34 PM Jeff Janes  wrote:

> On Thu, Mar 28, 2019 at 1:30 AM Thomas Munro 
> wrote:
>
>> On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas 
>> wrote:
>> > Once we have the FullTransactionId type and basic macros in place, I'm
>> > sure we could tidy up a bunch of code by using them.
>
>
> Thanks for the reviews!  Pushed.
>>
>
> I think that this might be broken.
>
> We have this change:
>
> @@ -73,7 +75,8 @@ GetNewTransactionId(bool isSubXact)
>
> LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>
> -   xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
> +   full_xid = ShmemVariableCache->nextFullXid;
> +   xid = XidFromFullTransactionId(full_xid);
>
> But then later on in an little-used code path around line 164:
>
> /* Re-acquire lock and start over */
> LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
> xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
>
> full_xid does not get updated, but then later on full_xid gets returned in
> lieu of xid.
>
> Is there a reason that this is OK?
>
>
Ah, sorry for the noise.  I see that this has been fixed already.  I wasn't
looking at HEAD, or at the other email thread, when I "discovered" this.

Sorry for the noise.

Cheers,

Jeff

improve transparency of bitmap-only heap scans

2019-05-18 Thread Jeff Janes

When bitmap-only heap scans were introduced in v11 (7c70996ebf0949b142a99)
no changes were made to "EXPLAIN".  This makes the feature rather opaque.
You can sometimes figure out what is going by the output of EXPLAIN
(ANALYZE, BUFFERS), but that is unintuitive and fragile.

Looking at the discussion where the feature was added, I think changing the
EXPLAIN just wasn't considered.

The attached patch adds "avoided" to "exact" and "lossy" as a category
under
"Heap Blocks".  Also attached is the example output, as the below will
probably wrap to the point of illegibility:

explain analyze select count(*) from foo  where a=35 and d between 67 and
70;
 QUERY
PLAN
-
 Aggregate  (cost=21451.36..21451.37 rows=1 width=8) (actual
time=103.955..103.955 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=9920.73..21442.44 rows=3570 width=0)
(actual time=100.239..103.204 rows=3950 loops=1)
 Recheck Cond: ((a = 35) AND (d >= 67) AND (d <= 70))
 Heap Blocks: avoided=3718 exact=73
 ->  BitmapAnd  (cost=9920.73..9920.73 rows=3570 width=0) (actual
time=98.666..98.666 rows=0 loops=1)
   ->  Bitmap Index Scan on foo_a_c_idx  (cost=0.00..1682.93
rows=91000 width=0) (actual time=28.541..28.541 rows=99776 loops=1)
 Index Cond: (a = 35)
   ->  Bitmap Index Scan on foo_d_idx  (cost=0.00..8235.76
rows=392333 width=0) (actual time=66.946..66.946 rows=399003 loops=1)
 Index Cond: ((d >= 67) AND (d <= 70))
 Planning Time: 0.458 ms
 Execution Time: 104.487 ms


I think the name of the node should also be changed to "Bitmap Only Heap
Scan", but I didn't implement that as adding another NodeTag looks like a
lot of tedious error prone work to do before getting feedback on whether
the change is desirable in the first place, or the correct approach.

 Cheers,

Jeff
create table foo as select floor(random()*100)::int as a, 
floor(random()*100)::int as b, floor(random()*100)::int as c, 
floor(random()*100)::int as d, floor(random()*100)::int as e from 
generate_series(1,1000);
vacuum ANALYZE ;
create index on foo (a,c);
create index on foo (d);
update foo set d=d+1 where a=35 and c = 70;

explain analyze select count(*) from foo  where a=35 and d between 67 and 70;
 QUERY PLAN 
 
-
 Aggregate  (cost=21451.36..21451.37 rows=1 width=8) (actual 
time=103.955..103.955 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=9920.73..21442.44 rows=3570 width=0) 
(actual time=100.239..103.204 rows=3950 loops=1)
 Recheck Cond: ((a = 35) AND (d >= 67) AND (d <= 70))
 Heap Blocks: avoided=3718 exact=73
 ->  BitmapAnd  (cost=9920.73..9920.73 rows=3570 width=0) (actual 
time=98.666..98.666 rows=0 loops=1)
   ->  Bitmap Index Scan on foo_a_c_idx  (cost=0.00..1682.93 
rows=91000 width=0) (actual time=28.541..28.541 rows=99776 loops=1)
 Index Cond: (a = 35)
   ->  Bitmap Index Scan on foo_d_idx  (cost=0.00..8235.76 
rows=392333 width=0) (actual time=66.946..66.946 rows=399003 loops=1)
 Index Cond: ((d >= 67) AND (d <= 70))
 Planning Time: 0.458 ms
 Execution Time: 104.487 ms



bitmap_only_avoided_v1.patch
Description: Binary data

crash testing suggestions for 12 beta 1

2019-05-23 Thread Jeff Janes

Now that beta is out, I wanted to do some crash-recovery testing where I
inject PANIC-inducing faults and see if it recovers correctly. A long-lived
Perl process keeps track of what it should find after the crash, and
verifies that it finds it.  You will probably be familiar with the general
theme from examples like the threads below.  Would anyone like to nominate
some areas to focus on?  I think the pluggable storage refactoring work
will be get inherently tested, so I'm not planning designing test
specifically for that (unless there is a non-core plugin I should test
with). Making the ctid be tie-breakers in btree index is also tested
inherently (plus I think Peter tested that pretty thoroughly himself with
similar methods).  I've already tested declarative partitioning where the
tuples do a lot of migrating, and tested prepared transactions.  Any other
suggestions for changes that might be risky and should be specifically
targeted for testing?



https://www.postgresql.org/message-id/CAMkU=1xeuubphdwdmb1wjn4+td4kpnenifatbxnk1xzhcw8...@mail.gmail.com


https://www.postgresql.org/message-id/CAMkU=1xBP8cqdS5eK8APHL=x6rhmmm2vg5g+qamduutsycw...@mail.gmail.com


Cheers,

Jeff

Re: WAL archive (archive_mode = always) ?

2018-10-19 Thread Jeff Janes

On Fri, Oct 19, 2018 at 10:00 AM Adelino Silva <
adelino.j.si...@googlemail.com> wrote:

> Hi,
>
> What is the advantage to use archive_mode = always in a slave server
> compared to archive_mode = on (shared WAL archive) ?
> I only see duplication of Wal files, what is the purpose of this feature ?
>

You might not have a shared wal archive in the first place.  For example,
your only access to the master is through the streaming replication
protocol, but you want to maintain a local WAL archive anyway so you can
PITR locally for testing or debugging purposes.

Cheers,

Jeff

Re: WAL archive (archive_mode = always) ?

2018-10-23 Thread Jeff Janes

On Mon, Oct 22, 2018 at 5:06 AM Adelino Silva <
adelino.j.si...@googlemail.com> wrote:

> Hello Takayuki,
>
> Sorry can you explain how we can same network bandwidth by not sending the
> WAL archive from the primary to the standby(s).
> I possible scenario is have to multiple standby servers in same host for
> same master. or other scenarios exists ?
>

Before  archive_mode = always became available, we've had to stream the WAL
twice, once to the hot standby for immediate application, and once to
pg_receivexlog for archival purposes.  So it doubled the bandwidth usage.

Cheers,

Jeff

Re: Estimating number of distinct values.

2018-10-24 Thread Jeff Janes

On Wed, Oct 24, 2018 at 10:07 AM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> Real number of distinct value for this dataset is about 10 millions. For
> some reasons, sampling using random blocks and Vitter algorithm produces
> worser results than just examining first 3 rows of the table:

It is a known problem with our sampling method that once a block is chosen,
it then oversamples rows from that block[1].  So you get too many blocks
with 0 rows sampled or 2 or more rows samples, and too few with exactly one
row sampled.  If rows with the same value are clustered together into same
blocks, this will find too many duplicates and really skew the Duj1
estimate, because we feeding it a biased sample.

Tomas was working on a patch to make the sampling truly random[2], but I
think he abandoned it to work on the multivariate statistics instead.  It
is hard to tell if the IO implications of no longer reading sampled blocks
in physical order would be acceptable, because everyone has different
hardware, data, and ideas of what is acceptable.

[1]
https://www.postgresql.org/message-id/CAMkU=1wrh_jopycayukbdqy4dwhsx1-1e2s0vvgfrryfxde...@mail.gmail.com

[2] https://www.postgresql.org/message-id/5588D644.1000909%402ndquadrant.com

Cheers,

Jeff

Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-26 Thread Jeff Janes

On Tue, Oct 23, 2018 at 10:51 AM Andres Freund  wrote:

> On 2018-10-23 13:54:31 +0200, Fabien COELHO wrote:
> >
> > Hello Tom & Amit,
> >
> > > > > Both animals use gcc experimental versions, which may rather
> underline a
> > > > > new bug in gcc head rather than an existing issue in pg. Or not.
> > >
> > > > It is possible, but what could be the possible theory?
> > >
> > > It seems like the two feasible theories are (1) gcc bug, or (2) buffer
> > > leak that only occurs in very narrow circumstances, perhaps from a race
> > > condition.  Given that the hash index code hasn't changed meaningfully
> > > in several months, I thought (1) seemed more probable.
> >
> > Yep, that is my thought as well.
>
>
> FWIW, my animal 'serinus', which runs debian's gcc-snapshot shows the same
> problem:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2018-10-22%2006%3A34%3A02
>
> So it seems much more likely to be 1).
>
>
> > The problem is that this kind of issue is not simple to wrap-up as a gcc
> bug
> > report, unlike other earlier instances that I forwarded to clang & gcc
> dev
> > teams.
> >
> > I'm in favor in waiting before trying to report it, to check whether the
> > probable underlying gcc problem is detected, reported by someone else,
> and
> > fixed in gcc head. If it persists, then we'll see.
>
> I suspect the easiest thing to narrow it down would be to bisect the
> problem in gcc :(
>

Their commit r265241 is what broke the PostgreSQL build.  It also broke the
compiler itself--at that commit it was no longer possible to build itself.
I had to --disable-bootstrap in order to get a r265241 compiler to test
PostgreSQL on.

Their commit r265375 fixed the ability to compile itself, but built
PostgreSQL binaries remain broken there and thereafter.

I ran this on a AWS c5d.4xlarge
ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20180912
(ami-0f65671a86f061fcd)

Configuring PostgreSQL with ./configure --enable-cassert is necessary in
order for the subsequent "make check" to experience the failure.

I'm using PostgreSQL v12dev commit 5953c99697621174f, but the problem is
not sensitive to that and reproduces back to v10.0.

I haven't the slightest idea what to do with this information, and GCC's
cryptic SVN commit messages don't offer much insight to me.

Cheers,

Jeff

Don't wake up to check trigger file if none is configured

2018-11-24 Thread Jeff Janes

A streaming replica waiting on WAL from the master will wake up every 5
seconds to check for a trigger file.  This is pointless if no trigger file
has been configured.

The attached patch suppresses the timeout when there is no trigger file
configured.

A minor thing to be sure, but there was a campaign a couple years ago to
remove such spurious wake-ups, so maybe this change is worthwhile.

I noticed that the existing codebase does not have a consensus on what to
pass to WaitLatch for the timeout when the timeout isn't relevant. I picked
0, but -1L also has precedent.

Cheers,

Jeff


WAL_sleep_not_triggered.patch
Description: Binary data

Re: Don't wake up to check trigger file if none is configured

2018-11-25 Thread Jeff Janes

On Sat, Nov 24, 2018 at 11:29 AM Jeff Janes  wrote:

> A streaming replica waiting on WAL from the master will wake up every 5
> seconds to check for a trigger file.  This is pointless if no trigger file
> has been configured.
>
> The attached patch suppresses the timeout when there is no trigger file
> configured.
>

Rebased over the removal of recovery.conf and renaming of TriggerFile.

If the promote_trigger_file param is someday made responsive to SIGHUP,  I
think this will just continue to do the right thing without further
modification.

Cheers,

Jeff

WAL_sleep_not_triggered_v2.patch
Description: Binary data

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-04 Thread Jeff Janes

On Wed, Jan 17, 2018 at 4:49 PM, David Gould  wrote:

>
> Please add the attached patch and this discussion to the open commit fest.
> The
> original bugs thread is here: 2018011254.1408.8342@wrigl
> eys.postgresql.org.
>
> Bug reference:  15005
> Logged by:  David Gould
> Email address:  da...@sonic.net
> PostgreSQL version: 10.1 and earlier
> Operating system:   Linux
> Description:
>
> ANALYZE can make pg_class.reltuples wildly inaccurate compared to the
> actual
> row counts for tables that are larger than the default_statistics_target.
>
> Example from one of a clients production instances:
>
> # analyze verbose pg_attribute;
> INFO:  analyzing "pg_catalog.pg_attribute"
> INFO:  "pg_attribute": scanned 3 of 24519424 pages, containing 6475
> live rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total
> rows.
>
> This is a large complex database - pg_attribute actually has about five
> million rows and needs about one hundred thouand pages. However it has
> become extremely bloated and is taking 25 million pages (192GB!), about 250
> times too much. This happened despite aggressive autovacuum settings and a
> periodic bloat monitoring script. Since pg_class.reltuples was 800 million,
> the bloat monitoring script did not detect that this table was bloated and
> autovacuum did not think it needed vacuuming.
>

I can see how this issue would prevent ANALYZE from fixing the problem, but
I don't see how it could have caused the problem in the first place.  In
your demonstration case, you had to turn off autovac in order to get it to
happen, and then when autovac is turned back on, it is all primed for an
autovac to launch, go through, touch almost all of the pages, and fix it
for you.  How did your original table get into a state where this wouldn't
happen?

Maybe a well-timed crash caused n_dead_tup to get reset to zero and that is
why autovac is not kicking in?  What are the pg_stat_user_table number and
the state of the visibility map for your massively bloated table, if you
still have them?

In any event, I agree with your analysis that ANALYZE should set the number
of tuples from scratch.  After all, it sets the other estimates, such as
MCV, from scratch, and those are much more fragile to sampling than just
the raw number of tuples are.  But if the default target is set to 1, that
would scan only 300 pages.  I think that that is a little low of a sample
size to base an estimate on, but it isn't clear to that using 300 pages
plus whacking them around with an exponential averaging is really going to
be much better.  And if you set your default target to 1, that is
more-or-less what you signed up for.

It is little weird to have VACUUM incrementally update and then ANALYZE
compute from scratch and discard the previous value, but no weirder than
what we currently do of having ANALYZE incrementally update despite that it
is specifically designed to representatively sample the entire table.  So I
don't think we need to decide what to do about VACUUM before we can do
something about ANALYZE.

So I support your patch.  There is probably more investigation and work
that could be done in this area, but those could be different patches, not
blocking this one.

Cheers,

Jeff

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-03-07 Thread Jeff Janes

On Sun, Mar 4, 2018 at 3:18 PM, David Gould  wrote:

> On Sun, 4 Mar 2018 07:49:46 -0800
> Jeff Janes  wrote:
>
> > On Wed, Jan 17, 2018 at 4:49 PM, David Gould  wrote:
> ...
> >
> > Maybe a well-timed crash caused n_dead_tup to get reset to zero and that
> is
> > why autovac is not kicking in?  What are the pg_stat_user_table number
> and
> > the state of the visibility map for your massively bloated table, if you
> > still have them?
>
> ...
>

> The main pain points are that when reltuples gets inflated there is no way
> to fix it, auto vacuum stops looking at the table and hand run ANALYZE
> can't
> reset the reltuples. The only cure is VACUUM FULL, but that is not really
> practical without unacceptable amounts of downtime.
>

But why won't an ordinary manual VACUUM (not FULL) fix it?  That seems like
that is a critical thing to figure out.

As for preventing it in the first place, based on your description of your
hardware and operations, I was going to say you need to increase the max
number of autovac workers, but then I remembered you from "Autovacuum slows
down with large numbers of tables. More workers makes it slower" (
https://www.postgresql.org/message-id/20151030133252.3033.4249%40wrigleys.postgresql.org).
So you are probably still suffering from that?  Your patch from then seemed
to be pretty invasive and so controversial.  I had a trivial but fairly
effective patch at the time, but it now less trivial because of how shared
catalogs are dealt with (commit 15739393e4c3b64b9038d75) and I haven't
rebased it over that issue.

Cheers,

Jeff

WARNING in parallel index creation.

2018-03-11 Thread Jeff Janes

If i run:

pgbench -i -s30

And then create the function:

CREATE OR REPLACE FUNCTION foobar(text)
 RETURNS text
 LANGUAGE plperl
 IMMUTABLE PARALLEL SAFE STRICT COST 1
AS $function$
  return scalar reverse($_[0]);
$function$;

Then when I create in index, I get a warning:

jjanes=# create index on pgbench_accounts (foobar(filler));
WARNING:  cannot set parameters during a parallel operation
WARNING:  cannot set parameters during a parallel operation

If I create the index again within the same session, there is no WARNING.

This only occurs if plperl.on_init is set in the postgresql.conf file.  It
doesn't seem to matter what it is set to,
even the empty string triggers the warning.

plperl.on_init=''

As far as I can tell the index is created correctly despite the warning.

Cheers,

Jeff

neqjoinsel versus "refresh materialized view concurrently"

2018-03-13 Thread Jeff Janes

The following commit has caused a devastating performance regression
in concurrent refresh of MV:

commit 7ca25b7de6aefa5537e0dbe56541bc41c0464f97
Author: Tom Lane 
Date:   Wed Nov 29 22:00:29 2017 -0500

Fix neqjoinsel's behavior for semi/anti join cases.


The below reproduction goes from taking about 1 second to refresh, to
taking an amount of time I don't have the patience to measure.

drop table foobar2 cascade;
create table foobar2 as select * from generate_series(1,20);
create materialized view foobar3 as select * from foobar2;
create unique index on foobar3 (generate_series );
analyze foobar3;
refresh materialized view CONCURRENTLY foobar3 ;


When I interrupt the refresh, I get a message including this line:

CONTEXT:  SQL statement "SELECT newdata FROM pg_temp_3.pg_temp_16420
newdata WHERE newdata IS NOT NULL AND EXISTS (SELECT * FROM
pg_temp_3.pg_temp_16420 newdata2 WHERE newdata2 IS NOT NULL AND newdata2
OPERATOR(pg_catalog.*=) newdata AND newdata2.ctid OPERATOR(pg_catalog.<>)
newdata.ctid) LIMIT 1"

So I makes sense that the commit in question could have caused a change in
the execution plan.  Because these are temp tables, I can't easily get my
hands on them to investigate further.

Cheers,

Jeff

Re: neqjoinsel versus "refresh materialized view concurrently"

2018-03-13 Thread Jeff Janes

On Tue, Mar 13, 2018 at 4:57 PM, Thomas Munro  wrote:

> On Wed, Mar 14, 2018 at 12:29 PM, Tom Lane  wrote:
> > Thomas Munro  writes:
> >> There is a fundamental and complicated estimation problem lurking here
> >> of course and I'm not sure what to think about that yet.  Maybe there
> >> is a very simple fix for this particular problem:
> >
> > Ah, I see you thought of the same hack I did.
> >
> > I think this may actually be a good fix, and here's the reason: this plan
> > is in fact being driven entirely off planner default estimates, because
> > we don't have any estimation code that knows what to do with
> > "wholerowvar *= wholerowvar".  I'm suspicious that we could drop the
> > preceding ANALYZE as being a waste of cycles, except maybe it's finding
> > out the number of rows for us.  In any case, LIMIT 1 is only a good idea
> > to the extent that the planner knows what it's doing, and this is an
> > example where it demonstrably doesn't and won't any time soon.
>
> Hmm.  I wonder if the ANALYZE might have been needed to avoid the
> nested loop plan at some point in history.
>
> Here's a patch to remove LIMIT 1, which fixes the plan for Jeff's test
> scenario and some smaller and larger examples I tried.  The query is
> already executed with SPI_execute(..., 1) so it'll give up after one
> row anyway.  The regression test includes a case that causes a row to
> be produced here and that's passing ('ERROR:  new data for
> materialized view "mvtest_mv" contains duplicate rows without any null
> columns').
>

Is there any good way to make the regression tests fail if the plan reverts
to the bad one?  The only thing I can think of would be to make the table
bigger so the regression tests becomes "noticeably slower", but that is
pretty vague and not user friendly to formally pass and just hope it is
slow enough for someone to investigate.

Cheers,

Jeff

Re: Ambigous Plan - Larger Table on Hash Side

2018-03-14 Thread Jeff Janes

On Tue, Mar 13, 2018 at 4:02 AM, Narendra Pradeep U U <
narendra.prad...@zohocorp.com> wrote:

> Hi,
>   Thanks everyone for your suggestions. I would like to add  explain
> analyze of both the plans so that we can have broader picture.
>
> I have a work_mem of 1000 MB.
>

Is it possible to repeat with 2000MB or 3000MB?  It would be interesting to
see what the estimated cost and what the actual time would be if there were
only 1 batch rather than 2.

Also, can you repeat all of these with EXPLAIN (ANALYZE, TIMING OFF) ?
Sometimes the act of measuring the times can distort the times by quite a
bit.  (It will still give an overall execution time, it just won't try to
attribute that time to the individual steps)

Cheers,

Jeff

Re: INOUT parameters in procedures

2018-03-14 Thread Jeff Janes

On Wed, Mar 14, 2018 at 10:46 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> committed
>
>
I'm getting compiler warnings:

pl_exec.c: In function 'exec_stmt_call':
pl_exec.c:2089:8: warning: variable 'numargs' set but not used
[-Wunused-but-set-variable]
int   numargs;
^

select version();
PostgreSQL 11devel-6b960aa on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit

Cheers,

Jeff

Re: INOUT parameters in procedures

2018-03-15 Thread Jeff Janes

On Thu, Mar 15, 2018 at 6:58 AM, Tom Lane  wrote:

> Jeff Janes  writes:
> > I'm getting compiler warnings:
> > pl_exec.c: In function 'exec_stmt_call':
> > pl_exec.c:2089:8: warning: variable 'numargs' set but not used
>
> Not fixed by 8df5a1c868cc28f89ac6221cff8e2b5c952d0eb6?
>

I think you meant to type "now fixed by".  (unless your compiler is pickier
than mine)

Cheers

Jeff

Bug with ICU for merge join

2023-03-24 Thread Jeff Janes

Ever since 27b62377b47f9e7bf58613, I have been getting "ERROR:  mergejoin
input data is out of order" for the attached reproducer.

I get this on Ubuntu 20.04 and 22.04, whether initdb was run under LC_ALL=C
or under LANG=en_US.UTF-8.

It is not my query, I don't really know what its point is.  I just got this
error while looking into the performance of it and accidentally running it
against 16dev.

Cheers,

Jeff


bad_merge.sql
Description: Binary data

awkward cancellation of parallel queries on standby.

2023-03-26 Thread Jeff Janes

When a parallel query gets cancelled on a standby due to
max_standby_streaming_delay, it happens rather awkwardly.  I get two errors
stacked up, a query cancellation followed by a connection termination.

I use `pgbench -R 1 -T3600 -P5` on the master to generate a light but
steady stream of HOT pruning records, and then run `select
sum(a.abalance*b.abalance) from pgbench_accounts a join pgbench_accounts b
using (bid);` on the standby not in a transaction block to be a
long-running parallel query (scale factor of 20)

I also set max_standby_streaming_delay = 0.  That isn't necessary, but it
saves wear and tear on my patience.

ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be
removed.
FATAL:  terminating connection due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be
removed.

This happens quite reliably.  In psql, these sometimes both show up
immediately, and sometimes only the first one shows up immediately and then
the second one appears upon the next communication to the backend.

I don't know if this is actually a problem.  It isn't for me as I don't do
this kind of thing outside of testing, but it seems untidy and I can see it
being frustrating from a catch-and-retry perspective and from a log-spam
perspective.

It looks like the backend gets signalled by the startup process, and then
it signals the postmaster to signal the parallel workers, and then they
ignore it for a quite long time (tens to hundreds of ms).  By the time they
get around responding, someone has decided to escalate things.  Which
doesn't seem to be useful, because no one can do anything until the workers
respond anyway.

This behavior seems to go back a long way, but the propensity for both
messages to show up at the same time vs. in different round-trips changes
from version to version.

Is this something we should do something about?

Cheers,

Jeff

connection timeout hint

2023-12-03 Thread Jeff Janes

When one tries to connect to a server and port which is protected by a
firewall, ones get messages like this:

Unix:
psql: error: connection to server at "192.168.0.26", port 5432 failed:
Connection timed out
Is the server running on that host and accepting TCP/IP connections?

Windows:
psql: error: connection to server at "192.168.0.26", port 5432 failed:
Connection timed out (0x274C/10060)
Is the server running on that host and accepting TCP/IP connections?

But the hint given is unhelpful, and even positively misleading.  If the
port is blocked by a firewall, it doesn't imply the database server is
not listening (if one could just get to it), and it doesn't matter if the
database server is listening.  If for some reason it weren't listening as
well as being blocked, making it listen wouldn't help as long it remains
blocked at the firewall.

Is there some portable way to detect this cause of the connection problem
(connection timeout) and issue a more suitable hint
explicitly mentioning firewalls and routers, or perhaps just no hint at all?

As far as I know, only a firewall causes this problem, at least on a
persistent basis.  Maybe you could see it sporadically on a vastly
overloaded server or a server caught in the process of rebooting.  It would
be better to give a hint that is correct the vast majority of the time than
one that is wrong the vast majority of the time.

There are a lot of questions about this on, for example, stackoverflow.  I
think people might be better able to figure it out for themselves if the
hint were not actively leading them astray.

Cheers,

Jeff

Re: SQL-standard function body

2021-04-22 Thread Jeff Janes

On Wed, Apr 7, 2021 at 3:55 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

>
> Committed.  Thanks!
>
>
This commit break line continuation prompts for unbalanced parentheses in
the psql binary.  Skimming through this thread, I don't see that this is
intentional or has been noticed before.

with psql -X

Before:

jjanes=# asdf (
jjanes(#

Now:

jjanes=# asdf (
jjanes-#

I've looked through the parts of the commit that change psql, but didn't
see an obvious culprit.

Cheers,

Jeff

Re: Query generates infinite loop

2022-05-04 Thread Jeff Janes

On Wed, Apr 20, 2022 at 5:43 PM Tom Lane  wrote:

> I wrote:
> > it's true that infinities as generate_series endpoints are going
> > to work pretty oddly, so I agree with the idea of forbidding 'em.
>
> > Numeric has infinity as of late, so the numeric variant would
> > need to do this too.
>
> Oh --- looks like numeric generate_series() already throws error for
> this, so we should just make the timestamp variants do the same.
>

The regression test you added for this change causes an infinite loop when
run against an unpatched server with --install-check.  That is a bit
unpleasant.  Is there something we can and should do about that?  I was
expecting regression test failures of course but not an infinite loop
leading towards disk exhaustion.

Cheers,

Jeff

Re: PostgreSQL 15 Beta 1 release announcement draft

2022-05-23 Thread Jeff Janes

On Sat, May 14, 2022 at 2:52 PM Jonathan S. Katz 
wrote:

> Hi,
>
> Attached is a draft of the release announcement for the PostgreSQL 15
> Beta 1 release. The goal of this announcement is to raise awareness
> around many of the new features appearing in PostgreSQL 15 and to
> encourage people to test. The success of the PostgreSQL 15 GA depends
> heavily on people testing during the Beta period!
>

I have some belated feedback.  I was excited to try this on Windows (I
don't have a build system for that) and so followed the first link in the
message, to https://www.postgresql.org/download/.  At first glance there is
nothing about beta there, but there is a prominent Windows icon so I
click on that.  And then to EDB, but there is no apparent way to download
beta, just the released versions.  I poked around EDB a bit but didn't find
anything promising, then backed out of all that poking around and
eventually all the way back to /download, where I scrolled down and finally
found the link to https://www.postgresql.org/download/snapshots/ which
tells me what I need to know.  But at this point I was more annoyed than
excited.

An invitation to download the beta should take me directly to the page
relevant to doing that.  I shouldn't have to read the page backwards, or do
a breadth-first traversal, to get to the right place efficiently.  People
will click on the first link which seems relevant, and "Windows" on the
generic download page certainly seems relevant to Beta for Windows, until
after you have scrolled down to find the beta/RC specific link instead.  (I
now recall being annoyed by this in a prior year as well, I guess I have a
bad memory for avoiding mistakes but a good memory for recalling them).
Also, the download page should probably say "binary packages and
installers" where it currently says  "There are source code and binary
packages of beta and release candidates", although I guess that is not
about the announcement itself.

Cheers,

Jeff

15beta1 tab completion of extension versions

2022-06-18 Thread Jeff Janes

Extension version strings need to be quoted.  Either double or single
quotes will work.  In released psql clients, tab completion offers double
quoted suggestions:

alter extension pg_trgm update TO 
"1.3"  "1.4"  "1.5"  "1.6"

But commit 02b8048ba5 broke that, it now offers unquoted version strings
which if used as offered then lead to syntax errors.

The code change seems to have been intentional, but I don't think the
behavior change was intended.  While the version string might not be an
identifier, it still needs to be treated as if it were one.
Putting pg_catalog.quote_ident back
into Query_for_list_of_available_extension_versions* fixes it, but might
not be the best way to fix it.

commit 02b8048ba5dc36238f3e7c3c58c5946220298d71 (HEAD, refs/bisect/bad)
Author: Tom Lane 
Date:   Sun Jan 30 13:33:23 2022 -0500

psql: improve tab-complete's handling of variant SQL names.


Cheer,

Jeff

1 2 >

1 - 100 of 162 matches

Mail list logo