Re: Is it OK to ignore directory open failure in ResetUnloggedRelations?

2017-12-04 Thread Justin Pryzby
On Mon, Dec 04, 2017 at 03:15:08PM -0500, Tom Lane wrote:
> While working through Michael Paquier's patch to clean up inconsistent
> usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
> subroutines are not consistent about whether a directory open failure
> results in erroring out or just emitting a LOG message and continuing.
> ResetUnloggedRelations itself throws a hard error if it fails to open
> pg_tblspc, but all the rest of reinit.c thinks a LOG message is
> sufficient.
...
> So now I'm thinking we should do the reverse and change these functions
> to give a hard error on AllocateDir failure.  That would result in
> startup-process failure if we are unable to scan the database, which is
> not great, but there's certainly something badly wrong if we can't.

I can offer a data point unrelated to unlogged relations.

Sometimes, following a reboot, if there's a tablespace on ZFS, and if a ZPOOL
backing device is missing/renamed (especially under qemu), postgres (if it was
shutdown cleanly) will happily start even though a tablespace is missing (due
to unable to find backing device - ZFS wants it to be exported and imported
before it scans all devices for matching UUID).

That has been surprising to me in the past and lead me to believe that
"services are up" following a reboot only to notice a bunch of ERRORs in the
logs a handful of minutes later.

Maybe that counts for a tangential +1.

Justin



Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL

2017-12-13 Thread Justin Pryzby
On Thu, Dec 14, 2017 at 08:51:06AM +0700, Ali Akbar wrote:
> 2017-12-13 15:37 GMT+07:00 Amit Langote :
> 
> > On 2017/12/13 15:59, Ali Akbar wrote:
> > >
> > > Thanks for the link to those thread.
> > >
> > > Judging from the discussion there, it will be a long way to prevent DROP
> > > NOT NULL.
> >
> > Yeah, I remembered that discussion when writing my email, but was for some
> > reason convinced that everything's fine even without the elaborate
> > book-keeping of inheritance information for NOT NULL constraints.  Thanks
> > Michael for reminding.
> >
> 
> Patch for adding check in pg_upgrade. Going through git history, the check
> for inconsistency in NOT NULL constraint has ben there since a long time
> ago. In this patch the check will be applied for all old cluster version.
> I'm not sure in which version was the release of table inheritance.

Here are some spelling and grammar fixes to that patch:

but nullabe in its children: nullable
child column is not market: marked
with adding: by adding
and restart: and restarting
the problem columns: the problematic columns
9.5, 9.6, 10: 9.5, 9.6, and 10
restore, that will cause error.: restore phase of pg_upgrade, that would cause 
an error.

Justin



Re: Bitmap table scan cost per page formula

2017-12-19 Thread Justin Pryzby
On Tue, Dec 19, 2017 at 07:55:32PM +, Haisheng Yuan wrote:
> Hi hackers,
> 
> This is Haisheng Yuan from Greenplum Database.
> 
> We had some query in production showing that planner favors seqscan over
> bitmapscan, and the execution of seqscan is 5x slower than using
> bitmapscan, but the cost of bitmapscan is 2x the cost of seqscan. The
> statistics were updated and quite accurate.
> 
> Bitmap table scan uses a formula to interpolate between random_page_cost
> and seq_page_cost to determine the cost per page. In Greenplum Database,
> the default value of random_page_cost is 100, the default value of
> seq_page_cost is 1. With the original cost formula, random_page_cost
> dominates in the final cost result, even the formula is declared to be
> non-linear. However, it is still more like linear, which can't reflect the
> real cost per page when a majority of pages are fetched. Therefore, the
> cost formula is updated to real non-linear function to reflect both
> random_page_cost and seq_page_cost for different percentage of pages
> fetched.
> 
> Even though PostgreSQL has much smaller default value of random_page_cost
> (4), the same problem exists there if we change the default value.
> 
> Old cost formula:
> cost_per_page = random_page_cost - (random_page_cost - seq_page_cost) *
> sqrt(pages_fetched / T);
> [image: Inline image 1]
> 
> New cost formula:
> cost_per_page = seq_page_cost * pow(random_page_cost / seq_page_cost, 1 -
> sqrt(pages_fetched / T));
> [image: Inline image 2]
> 
> Below is the graph (credit to Heikki) that plots the total estimated cost
> of a bitmap heap scan, where table size is 1 pages, and
> random_page_cost=10 and seq_page_cost=1. X axis is the number of pages
> fetche. I.e. on the left, no pages are fetched, and on the right end, at
> 1, all pages are fetched. The original formula is in black, the new
> formula in red, and the horizontal line, in blue, shows the cost of a Seq
> Scan.
> [image: Inline image 3]

Thanks for caring about bitmap scans ;)

There's a couple earlier/ongoing discussions on this:

In this old thread: 
https://www.postgresql.org/message-id/CAGTBQpZ%2BauG%2BKhcLghvTecm4-cGGgL8vZb5uA3%3D47K7kf9RgJw%40mail.gmail.com
..Claudio Freire  wrote:
> Correct me if I'm wrong, but this looks like the planner not
> accounting for correlation when using bitmap heap scans.
> 
> Checking the source, it really doesn't.

..which I think is basically right: the formula does distinguish between the
cases of small or large fraction of pages, but doesn't use correlation.  Our
issue in that case seems to be mostly a failure of cost_index to account for
fine-scale deviations from large-scale correlation; but, if cost_bitmap
accounted for our high correlation metric (>0.99), it might've helped our case.

Note costsize.c:
 * Save amcostestimate's results for possible use in bitmap scan planning.
 * We don't bother to save indexStartupCost or indexCorrelation, because a
 * BITMAP SCAN DOESN'T CARE ABOUT EITHER.

See more at this recent/ongoing discussion (involving several issues, only one
of which is bitmap cost vs index cost):
https://www.postgresql.org/message-id/flat/20171206214652.GA13889%40telsasoft.com#20171206214652.ga13...@telsasoft.com

Consider the bitmap scans in the two different cases:

1) In Vitaliy's case, bitmap was being chosen in preference to index scan (due
in part to random_page_cost>1), but performed poorly, partially because bitmap
component must complete before the heap reads can begin.  And also because the
heap reads for the test case involving modular division would've read pages
across the entire length of the table, incurring maximum lseek costs.

2) In my case from ~16 months ago, index scan was being chosen in preference to
bitmap, but index scan was incurring high seek cost.  We would've been happy if
the planner would've done a bitmap scan on a weekly-partitioned child table
(with 7 days data), while querying one day's data (1/7th of the table), 99% of
which would been strictly sequential page reads, so incurring low lseek costs
(plus some component of random costs for the remaining 1% "long tail").

It seems clear to me that sequentially reading 1/7th of the tightly clustered
pages in a table ought to be costed differently than reading 1/7th of the pages
evenly distributed accross its entire length.

I started playing with this weeks ago (probably during Vitaliy's problem
report).  Is there any reason cost_bitmap_heap_scan shouldn't interpolate based
on correlation from seq_page_cost to rand_page_cost, same as cost_index ?

Justin



PG10.1 autovac killed building extended stats

2017-11-17 Thread Justin Pryzby
After adding extended/MV stats to a few of our tables a few days ago, it looks
like I wasn't been paying attention and this first crashed 2 nights ago.  Why
at 1am?  not sure.  I have an "reindex" job which runs at 1am, and an
vacuum/analyze job which runs at 2am, but I don't use cron to change
autovac/analyze thresholds..

Nov 16 01:15:42 database kernel: postmaster[16144]: segfault at 0 ip 
006f3e3e sp 7ffe2c8fc650 error 4 in postgres[40+693000]
[...]
Nov 17 01:15:41 database kernel: postmaster[7145]: segfault at 0 ip 
006f3e3e sp 7ffe2c8fc650 error 4 in postgres[40+693000]

[pryzbyj@database ~]$ ps -fu postgres
UIDPID  PPID  C STIME TTY  TIME CMD
postgres  1757 1  0 Nov09 ?00:20:43 /usr/pgsql-10/bin/postmaster -D 
/var/lib/pgsql/10/data
=> no longer running Alvaro's original patch on this server, which was also
first to experience the BRIN crash..

< 2017-11-16 01:15:43.103 -04  >LOG:  server process (PID 16144) was terminated 
by signal 11: Segmentation fault
< 2017-11-16 01:15:43.103 -04  >DETAIL:  Failed process was running: 
autovacuum: ANALYZE public.daily_enodeb_baseband_view_201711

< 2017-11-17 01:15:41.673 -04  >LOG:  server process (PID 7145) was terminated 
by signal 11: Segmentation fault
< 2017-11-17 01:15:41.673 -04  >DETAIL:  Failed process was running: 
autovacuum: ANALYZE public.daily_enodeb_baseband_view_201711

Here's a log of my reindex job for the last two nights' crashes.  It's
suspicious that the baseband table was reindexed ~15min before autovac crashed
during its processing, but it's also unclear to me why it would matter.

[...]
|Thu Nov 16 01:02:54 -04 2017: daily_enodeb_baseband_view_201711: 
daily_enodeb_baseband_view_201711_unique_idx(repack current)...
|Thu Nov 16 01:02:59 -04 2017: daily_enodeb_baseband_view_201711: 
pg_toast.pg_toast_691157026_index(reindex system)...
...
|Thu Nov 16 01:15:22 -04 2017: eric_hss_l2if_metrics_cum: 
eric_hss_l2if_metrics_cum_start_time_idx(repack non-partitioned)...
|Thu Nov 16 01:15:27 -04 2017: eric_hss_platform_metrics: 
eric_hss_platform_metrics_start_time_idx(repack non-partitioned)...
|WARNING:  terminating connection because of crash of another server process
|DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
|HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
|WARNING: Error creating index "public"."index_40552734": server closed the 
connection unexpectedly
|This probably means the server terminated abnormally
|before or while processing the request.
|WARNING: Skipping index swapping for "eric_hss_platform_metrics", since no new 
indexes built
|WARNING: repack failed for "eric_hss_platform_metrics_start_time_idx"
|psql: FATAL:  the database system is in recovery mode
|Filesystem  Size  Used Avail Use% Mounted on
|/dev/vdb2.6T  2.1T  459G  83% /var/lib/pgsql

[...]
|Fri Nov 17 01:01:44 -04 2017: daily_enodeb_baseband_view_201711: 
daily_enodeb_baseband_view_201711_unique_idx(repack current)...
|Fri Nov 17 01:01:54 -04 2017: daily_enodeb_baseband_view_201711: 
pg_toast.pg_toast_691157026_index(reindex system)...
...
|Fri Nov 17 01:14:58 -04 2017: link_busy_hr: lbh_start_time_idx(repack 
non-partitioned)...
|Fri Nov 17 01:14:58 -04 2017: loaded_cdr_files: 
loaded_cdr_files_filename(repack non-partitioned)...
|WARNING:  terminating connection because of crash of another server process
|DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
|HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
|WARNING: Error creating index "public"."index_30971": server closed the 
connection unexpectedly
|This probably means the server terminated abnormally
|before or while processing the request.
|WARNING: Skipping index swapping for "loaded_cdr_files", since no new indexes 
built
|WARNING: repack failed for "loaded_cdr_files_filename"
|psql: FATAL:  the database system is in recovery mode

Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  statext_ndistinct_build (totalrows=300, numrows=300, rows=0x21df3e8, 
attrs=, stats=0x0) at mvdistinct.c:103
103 item->attrs = 
bms_add_member(item->attrs,

(gdb) p stats
$5 = (VacAttrStats **) 0x0
=> This is an error, no ??

(gdb) p item
$1 = (MVNDistinctItem *) 0x2090928
(gdb) p result
$6 = (MVNDistinct *) 0x2090918

(gdb) p item->attrs
$2 = (Bitmapset *) 0x0
(gdb) p rows
$3 = (HeapTuple *) 0x21df3e8
(gdb) p j
$8 = 
(gdb) p attrs
$4 = 

Let me know if there's anything more I should send.

Justin

(gdb) bt f
#0  

Re: PG10.1 autovac crashed building extended stats

2017-11-17 Thread Justin Pryzby
On Fri, Nov 17, 2017 at 01:27:49PM -0300, Alvaro Herrera wrote:
> Justin Pryzby wrote:
> > After adding extended/MV stats to a few of our tables a few days ago, it 
> > looks
> > like I wasn't been paying attention and this first crashed 2 nights ago.  
> > Why
> > at 1am?  not sure.  I have an "reindex" job which runs at 1am, and an
> > vacuum/analyze job which runs at 2am, but I don't use cron to change
> > autovac/analyze thresholds..
> 
> Can you please show the definition of the table and of the extended
> stats?

gtt=# SELECT stxrelid::regclass, stxname, stxkind FROM pg_statistic_ext ORDER 
BY 1;
stxrelid |  
stxname  | stxkind 
-+---+-
 daily_umts_eric_cell_traffic_hs_view_201603 | 
daily_umts_eric_cell_traffic_hs_view_201603_key_stats | {d,f}
 daily_umts_eric_cell_traffic_hs_eul_view_201603 | 
daily_umts_eric_cell_traffic_hs_eul_view_201603_key_stats | {d,f}
 daily_eric_umts_rnc_utrancell_view_201603   | 
daily_eric_umts_rnc_utrancell_view_201603_key_stats   | {d,f}
 daily_umts_eric_cell_traffic_hs_view_201504 | 
daily_umts_eric_cell_traffic_hs_view_201504_key_stats | {d,f}
 daily_umts_eric_cell_traffic_hs_eul_view_201504 | 
daily_umts_eric_cell_traffic_hs_eul_view_201504_key_stats | {d,f}
 daily_eric_umts_rnc_utrancell_view_201504   | 
daily_eric_umts_rnc_utrancell_view_201504_key_stats   | {d,f}
 daily_enodeb_ncell_view_201603  | 
daily_enodeb_ncell_view_201603_key_stats  | {d,f}
 daily_enodeb_ncell_view_201503  | 
daily_enodeb_ncell_view_201503_key_stats  | {d,f}
 daily_enodeb_ncell_view_201502  | 
daily_enodeb_ncell_view_201502_key_stats  | {d,f}
 daily_enodeb_ncell_view_201501  | 
daily_enodeb_ncell_view_201501_key_stats  | {d,f}
 daily_enodeb_baseband_view_201603   | 
daily_enodeb_baseband_view_201603_key_stats   | {d,f}
 daily_enodeb_baseband_view_201503   | 
daily_enodeb_baseband_view_201503_key_stats   | {d,f}
 daily_enodeb_baseband_view_201502   | 
daily_enodeb_baseband_view_201502_key_stats   | {d,f}
 daily_enodeb_baseband_view_201501   | 
daily_enodeb_baseband_view_201501_key_stats   | {d,f}
 daily_enodeb_cell_view_201603   | 
daily_enodeb_cell_view_201603_key_stats   | {d,f}
 daily_enodeb_cell_view_201502   | 
daily_enodeb_cell_view_201502_key_stats   | {d,f}
 daily_enodeb_201603 | 
daily_enodeb_201603_key_stats | {d,f}
 daily_enodeb_201503 | 
daily_enodeb_201503_key_stats | {d,f}
 daily_enodeb_201502 | 
daily_enodeb_201502_key_stats | {d,f}
 daily_enodeb_201501 | 
daily_enodeb_201501_key_stats | {d,f}
 daily_enodeb_cell_view_201710   | x
 | {d,f}
 daily_cdr_pstn_user_201711  | 
daily_cdr_pstn_user_201711_key_stats  | {d,f}
 daily_umts_eric_cell_traffic_hs_eul_view_201711 | 
daily_umts_eric_cell_traffic_hs_eul_view_201711_key_stats | {d,f}
 daily_umts_eric_cell_traffic_hs_view_201711 | 
daily_umts_eric_cell_traffic_hs_view_201711_key_stats | {d,f}
 daily_eric_umts_rnc_utrancell_view_201711   | 
daily_eric_umts_rnc_utrancell_view_201711_key_stats   | {d,f}
 daily_enodeb_baseband_view_201711   | 
daily_enodeb_baseband_view_201711_key_stats   | {d,f}
 daily_enodeb_cell_view_201711   | 
daily_enodeb_cell_view_201711_key_stats   | {d,f}
 daily_enodeb_ncell_view_201711  | 
daily_enodeb_ncell_view_201711_key_stats  | {d,f}
 daily_enodeb_201711 | 
daily_enodeb_201711_key_stats | {d,f}
(29 rows)

Here's the table which was 1) reindexed (including its toast) and 2)
autovacuumed(crashed):

gtt=# \d+ daily_enodeb_baseband_view_201711
  Table 
"public.daily_enodeb_baseband_view_201711"
Column|Type | Collation 
| Nullable | Default | Storage  | Stats target | Description 
--+-+---+--+-+--+--+-
 device_id| smallint|   
| not null | | p

Re: PG10.1 autovac killed building extended stats

2017-11-17 Thread Justin Pryzby
On Fri, Nov 17, 2017 at 01:36:00PM -0300, Alvaro Herrera wrote:
> Justin Pryzby wrote:
> 
> > Core was generated by `postgres: autovacuum worker process   gtt
> >  '.
> > Program terminated with signal 11, Segmentation fault.
> > #0  statext_ndistinct_build (totalrows=300, numrows=300, rows=0x21df3e8, 
> > attrs=, stats=0x0) at mvdistinct.c:103
> > 103 item->attrs = 
> > bms_add_member(item->attrs,
> > 
> > (gdb) p stats
> > $5 = (VacAttrStats **) 0x0
> > => This is an error, no ??
> 
> Not necessarily, but then I think this previous code is busted:

> If I recall things correctly, the "continue" should be executed
> regardless of IsAutoVacuumWorkerProcess() (only the log should be
> conditional on that).  I'm not sure how I arrived at the current coding,
> which I added in bf2a691e02d7 -- probably just fuzzy thinking.

Is it useful to run with that change ?

Justin



Re: Speeding up pg_upgrade

2017-12-07 Thread Justin Pryzby
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
> 
> There are clusters that take a long time to dump the schema from the old
> cluster

Maybe it isn't representative of a typical case, but I can offer a data point:

For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression).  We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them.  Some of the tables are wide (historically up to 1600 columns).  Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).  Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's not
the first time we've seen an issue with larger number of children*columns.

Our slowest pg-upgrade was ~40min, caused by column defaults in a case where I
failed to re-DROP DEFAULTs after our first scheduled upgrade date was pushed
back by over a month.  Most of the rest were completed in less than 15min.

Justin



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-06 Thread Justin Pryzby
On Thu, Apr 26, 2018 at 07:29:37PM +1200, David Rowley wrote:
> On 25 April 2018 at 09:59, Alvaro Herrera  wrote:
> > Amit Langote wrote:
> >> Although the config.sgml coverage of the new capabilities seems pretty
> >> good, some may find their being mentioned in 5.10 Table Partitioning
> >> helpful.  Or if we don't want to hijack 5.10.4, maybe create a 5.10.5.
> >
> > Can you (or someone) describe what would that section contain?
> 
> I've drafted and attached a patch of how I think this should look.
> Likely it will need some tweaking, but it is probably a good starting
> point for discussion.

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 34da0d8d57..89735b4804 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml

+   
+Unlike constraint exclusion, partition pruning can be performed much more
+quickly as it does not have to scan each individual partition's metadata
quickly COMMA

But actually I suggest:
Partition pruning is much more efficient than constraint exclusion, since
pruning avoids scanning each partition's metadata...


+Partition Pruning is also more powerful than constraint exclusion as
+partition pruning is not something that is performed only during the
remove "something that is" ?
Or just merge into the next sentence.
Note: Amit and David commented on this previously.

+planning of a given query.  In certain cases, partition pruning may also
+be performed during execution of the query as well.  This allows pruning
"also" is redundant with "as well"

+to be performed using values which are unknown during query planning, for
could say "are not yet known"

+The partition pruning which is performed during execution is done so at
+either one or both of the following times:
remove "either" ?

+   During initialization of the query plan.  Partition pruning can be
+   initialization phase of execution.  If partition pruning can be
+   performed here then there is the added benefit of not having to
here COMMA

+   initialize partitions which are pruned.  Partitions which are pruned
+   during this stage will not show up in the query's

+   During actual execution of the query plan.  Partition pruning may also
Remove "actual" ?

+   be performed here to remove partitions using values which are only known
+   during actual query execution.  This includes values from subqueries and

+   values from execution time parameters such as ones from parameterized
execution-time?
s/ones/those/

+   partition column or expression changes.  In order to determine if
+   partitions were pruned at this stage requires careful inspection of the
+   nloops property in the
+   EXPLAIN ANALYZE output.
s/In order to determine/Determining/

+
+ Currently, partition pruning of partitions during the planning of an
s/partition //1 (just "pruning of partitions")

+ UPDATE or DELETE command are
s/are/is/

+ internally implemented using the constraint exclusion method.  Only
remove "internally"?

+ SELECT uses the faster partition pruning method.  Also
Also COMMA

+ partition pruning performed during execution is only done so for the
Remove "so".

Justin



Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2018-04-28 Thread Justin Pryzby
On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
> > It's probably a bit late in the v10 cycle to be taking any risks in
> > this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> > as soon as the v11 cycle opens, unless someone can show an example
> > of non-broken coding that requires it.  (And if so, there ought to
> > be a regression test incorporating that.)

I'm resending this in case it's been forgotten about and in case there's still
time this cycle to follow through in removing RememberToFreeTupleDescAtEOX.

..And because I ran into it again earlier this month, ALTERing an 1600 column
table with 500 children (actually while rewriting to reduce to 12 childs); on a
dedicated DB VM with 8GB RAM:

Mar  7 11:44:52 telsaDB kernel: Out of memory: Kill process 47490 (postmaster) 
score 644 or sacrifice child
Mar  7 11:44:52 telsaDB kernel: Killed process 47490, UID 26, (postmaster) 
total-vm:6813528kB, anon-rss:5212288kB, file-rss:2296kB

On Wed, Oct 18, 2017 at 02:54:54PM -0500, Justin Pryzby wrote:
> Hi,
> 
> I just ran into this again in another context (see original dicussion, quoted
> below).
> 
> Some time ago, while initially introducting non-default stats target, I set 
> our
> non-filtering columns to "STATISTICS 10", lower than default, to minimize the
> size of pg_statistic, which (at least at one point) I perceived to have become
> bloated and causing issue (partially due to having an excessive number of
> "daily" granularity partitions, a problem I've since mitigated).
> 
> The large number of columns with non-default stats target was (I think) 
> causing
> pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more
> disruptive than necessary, so now I'm going back and fixing it.
> 
> [pryzbyj@database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS 
> -1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
> 
> [pryzbyj@database ~]$ dmesg |tail -n2
> Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child
> Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, 
> anon-rss:8977764kB, file-rss:8kB
> 
> So I'm hoping to encourage someone to commit the change contemplated earlier.



allow psql to watch \dt

2018-05-11 Thread Justin Pryzby
I thought that would be desirable, although I don't see any better way of
getting there than this.

I don't see other commands for which which watch is wanted...but who am I to
say that watching creation extention isn't useful?  So I imagine this should be
generalized to save query buffer for all \d commands.

BTW, for amusement value, I briefly looked at implementing it for \!

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4c85f43..580d708 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -68,7 +68,7 @@ static backslashResult exec_command_copy(PsqlScanState 
scan_state, bool active_b
 static backslashResult exec_command_copyright(PsqlScanState scan_state, bool 
active_branch);
 static backslashResult exec_command_crosstabview(PsqlScanState scan_state, 
bool active_branch);
 static backslashResult exec_command_d(PsqlScanState scan_state, bool 
active_branch,
-  const char *cmd);
+  const char *cmd, PQExpBuffer query_buf);
 static backslashResult exec_command_edit(PsqlScanState scan_state, bool 
active_branch,
  PQExpBuffer query_buf, PQExpBuffer 
previous_buf);
 static backslashResult exec_command_ef_ev(PsqlScanState scan_state, bool 
active_branch,
@@ -310,7 +310,7 @@ exec_command(const char *cmd,
else if (strcmp(cmd, "crosstabview") == 0)
status = exec_command_crosstabview(scan_state, active_branch);
else if (cmd[0] == 'd')
-   status = exec_command_d(scan_state, active_branch, cmd);
+   status = exec_command_d(scan_state, active_branch, cmd, 
query_buf);
else if (strcmp(cmd, "e") == 0 || strcmp(cmd, "edit") == 0)
status = exec_command_edit(scan_state, active_branch,
   query_buf, 
previous_buf);
@@ -693,7 +693,7 @@ exec_command_crosstabview(PsqlScanState scan_state, bool 
active_branch)
  * \d* commands
  */
 static backslashResult
-exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd)
+exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd, 
PQExpBuffer query_buf)
 {
backslashResult status = PSQL_CMD_SKIP_LINE;
boolsuccess = true;
@@ -720,7 +720,7 @@ exec_command_d(PsqlScanState scan_state, bool 
active_branch, const char *cmd)
success = describeTableDetails(pattern, 
show_verbose, show_system);
else
/* standard listing of interesting 
things */
-   success = listTables("tvmsE", NULL, 
show_verbose, show_system);
+   success = listTables("tvmsE", NULL, 
show_verbose, show_system, query_buf);
break;
case 'A':
success = describeAccessMethods(pattern, 
show_verbose);
@@ -794,7 +794,7 @@ exec_command_d(PsqlScanState scan_state, bool 
active_branch, const char *cmd)
case 'i':
case 's':
case 'E':
-   success = listTables([1], pattern, 
show_verbose, show_system);
+   success = listTables([1], pattern, 
show_verbose, show_system, query_buf);
break;
case 'r':
if (cmd[2] == 'd' && cmd[3] == 's')
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 410131e..b637e1b 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3404,7 +3404,7 @@ listDbRoleSettings(const char *pattern, const char 
*pattern2)
  * (any order of the above is fine)
  */
 bool
-listTables(const char *tabtypes, const char *pattern, bool verbose, bool 
showSystem)
+listTables(const char *tabtypes, const char *pattern, bool verbose, bool 
showSystem, PQExpBuffer buf)
 {
boolshowTables = strchr(tabtypes, 't') != NULL;
boolshowIndexes = strchr(tabtypes, 'i') != NULL;
@@ -3413,7 +3413,6 @@ listTables(const char *tabtypes, const char *pattern, 
bool verbose, bool showSys
boolshowSeq = strchr(tabtypes, 's') != NULL;
boolshowForeign = strchr(tabtypes, 'E') != NULL;
 
-   PQExpBufferData buf;
PGresult   *res;
printQueryOpt myopt = pset.popt;
static const bool translate_columns[] = {false, false, true, false, 
false, false, false};
@@ -3422,13 +3421,13 @@ listTables(const char *tabtypes, const char *pattern, 
bool verbose, bool showSys
if (!(showTables || showIndexes || showViews || showMatViews || showSeq 
|| showForeign))
showTables = showViews = showMatViews = showSeq = showForeign = 
true;
 
-   initPQExpBuffer();
+   initPQExpBuffer(buf);
 
/*
 * Note: as 

Re: Postgres 11 release notes

2018-05-11 Thread Justin Pryzby
On Fri, May 11, 2018 at 12:22:08PM -0700, Andres Freund wrote:
> Btw, is it just me, or do the commit and docs confuse say stalled when
> stale is intended?

Should be fixed since yesterday's 8e12f4a250d250a89153da2eb9b91c31bb80c483 ?

Justin



Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-08 Thread Justin Pryzby
3rd iteration ; thanks for bearing with me.

On Tue, May 08, 2018 at 12:35:00PM +0300, Alexander Korotkov wrote:
> Hi, Justin!
> 
> Thank you for revising documentation patch.
> 
> On Mon, May 7, 2018 at 7:55 PM, Justin Pryzby <pry...@telsasoft.com> wrote:


+In order to detect stale index statistics, the number of total heap
+tuples during previous statistics collection is stored in the index
+meta-page.

Consider removing: "during previous statistics collection"
Or: "during THE previous statistics collection"

+ Once the number of inserted tuples since previous

since THE previous

+statistics collection is more than
+vacuum_cleanup_index_scale_factor fraction of

Since the multiplier can be greater than 1, should we say "multiple" instead of
fraction?

+during VACUUM cycle.  Thus, skipping of the B-tree
+index scan during cleanup stage is only possible when second and
+subsequent VACUUM cycles detecting no dead tuples.

Change "detecting" to "detect".  Or maybe just "find"

Justin



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-08 Thread Justin Pryzby
On Mon, May 07, 2018 at 06:00:59PM +1200, David Rowley wrote:
> Many thanks for reviewing this.

2nd round - from the minimalist department:

+partitions which cannot possibly contain any matching records.
maybe: partitions which cannot match any records.

+   
+Partition pruning done during execution can be performed at any of the
+following times:

remove "done"?

+   number of partitions which were removed during this phase of pruning by
remove "of prunning"

Justin



Re: doc fixes: vacuum_cleanup_index_scale_factor

2018-05-07 Thread Justin Pryzby
On Mon, May 07, 2018 at 07:26:25PM +0300, Alexander Korotkov wrote:
> Hi!
> 
> I've revised docs and comments, and also made some fixes in the code.
> See the attached patchset.
> 
> * 0004-btree-cleanup-docs-comments-fixes.patch
> Documentation and comment improvements from Justin Pryzby
> revised by me.

2nd iteration:

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index eabe2a9235..785ecf922a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1893,15 +1893,35 @@ include_dir 'conf.d'
   
   

-When no tuples were deleted from the heap, B-tree indexes might still
-be scanned during VACUUM cleanup stage by two
-reasons.  The first reason is that B-tree index contains deleted pages
-which can be recycled during cleanup.  The second reason is that B-tree
-index statistics is stalled.  The criterion of stalled index statistics
-is number of inserted tuples since previous statistics collection
-is greater than vacuum_cleanup_index_scale_factor
-fraction of total number of heap tuples.
+When no tuples were deleted from the heap, B-tree indexes are still
+scanned during VACUUM cleanup stage unless two
+conditions are met: the index contains no deleted pages which can be
+recycled during cleanup; and, the index statistics are not stale.
+In order to detect stale index statistics, number of total heap tuples
should say: "THE number"

+during previous statistics collection is memorized in the index
s/memorized/stored/

+meta-page.  Once number number of inserted tuples since previous
Should say "Once the number of inserted tuples..."

+statistics collection is more than
+vacuum_cleanup_index_scale_factor fraction of
+number of heap tuples memorized in the meta-page, index statistics is
s/memorized/stored/

+considered to be stalled.  Note, that number of heap tuples is written
"THE number"
s/stalled/stale/

+to the meta-page at the first time when no dead tuples are found
remove "at"

+during VACUUM cycle.  Thus, skip of B-tree index
I think should say: "Thus, skipping of the B-tree index scan"

+scan during cleanup stage is only possible in second and subsequent
s/in/when/

+   
+Zero value of vacuum_cleanup_index_scale_factor
I would say "A zero value of ..."

Thanks,
Justin



documentation fixes for partition pruning, round two

2018-05-23 Thread Justin Pryzby
On Fri, May 18, 2018 at 08:56:53PM -0500, Justin Pryzby wrote:
> I reread this and have some more comments.
> https://www.postgresql.org/docs/devel/static/ddl-partitioning.html

> Let me know if it's useful to provide a patch.

I propose this.

There's two other, wider changes to consider:

 - should "5.10.4. Partition Pruning" be moved after "5.10.2. Declarative
   Partitioning", rather than after "5.10.3. Implementation Using Inheritance" ?
 - should we find a unified term for "inheritence-based partitioning" and avoid
   using the word "partitioning" in that context?  For example: "Partitioning
   can be implemented using table inheritance[...]".  One possible phrase
   currently begin used is: "legacy inheritance method".

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2cd0b8a..6e1ade9 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2967,70 +2967,70 @@ VALUES ('Albany', NULL, NULL, 'NY');

 

 It is not possible to turn a regular table into a partitioned table or
 vice versa.  However, it is possible to add a regular or partitioned table
 containing data as a partition of a partitioned table, or remove a
 partition from a partitioned table turning it into a standalone table;
 see  to learn more about the
 ATTACH PARTITION and DETACH PARTITION
 sub-commands.

 

 Individual partitions are linked to the partitioned table with inheritance
-behind-the-scenes; however, it is not possible to use some of the
-inheritance features discussed in the previous section with partitioned
-tables and partitions.  For example, a partition cannot have any parents
+behind-the-scenes; however, it is not possible to use some of the generic
+features of inheritance (discussed below) with declaratively partitioned
+tables or their partitions For example, a partition cannot have any parents
 other than the partitioned table it is a partition of, nor can a regular
-table inherit from a partitioned table making the latter its parent.
-That means partitioned tables and partitions do not participate in
-inheritance with regular tables.  Since a partition hierarchy consisting
-of the partitioned table and its partitions is still an inheritance
-hierarchy, all the normal rules of inheritance apply as described in
+table inherit from a partitioned table making the latter its parent.  That
+means partitioned tables and partitions do not participate in inheritance
+with regular tables.  Since a partition hierarchy consisting of the
+partitioned table and its partitions is still an inheritance hierarchy, all
+the normal rules of inheritance apply as described in
  with some exceptions, most notably:
 
 
  
   
Both CHECK and NOT NULL
constraints of a partitioned table are always inherited by all its
partitions.  CHECK constraints that are marked
NO INHERIT are not allowed to be created on
partitioned tables.
   
  
 
  
   
Using ONLY to add or drop a constraint on only the
partitioned table is supported when there are no partitions.  Once
partitions exist, using ONLY will result in an error
as adding or dropping constraints on only the partitioned table, when
-   partitions exist, is not supported.  Instead, constraints can be added
-   or dropped, when they are not present in the parent table, directly on
-   the partitions.  As a partitioned table does not have any data
-   directly, attempts to use TRUNCATE
+   partitions exist, is not supported.  Instead, constraints on the
+   partitions themselves can be added and (if they are not present in the
+   parent table) dropped.  As a partitioned table does not
+   have any data directly, attempts to use TRUNCATE
ONLY on a partitioned table will always return an
error.
   
  
 
  
   
Partitions cannot have columns that are not present in the parent.  It
-   is neither possible to specify columns when creating partitions with
-   CREATE TABLE nor is it possible to add columns to
+   is not possible to specify columns when creating partitions with
+   CREATE TABLE, nor is it possible to add columns to
partitions after-the-fact using ALTER TABLE.  Tables 
may be
added as a partition with ALTER TABLE ... ATTACH 
PARTITION
only if their columns exactly match the parent, including any
oid column.
   
  
 
  
   
You cannot drop the NOT NULL constraint on a
partition's column if the constraint is present in the parent table.
   
  
 
@@ -3347,37 +3347,37 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
on individual partitions, not the partitioned table.
   
  
 

Re: documentation fixes for partition pruning, round two

2018-05-23 Thread Justin Pryzby
On Thu, May 24, 2018 at 11:30:40AM +0900, Amit Langote wrote:
> Hi Justin.
> 
> Thanks for writing the patch.  I have a couple of comments.

Thanks for your review, find attached updated patch.

> +possible to show the difference between a plan whose partitions have been
> +pruned and one whose partitions haven't.  A typical unoptimized plan for
> +this type of table setup is:
> 
> "a plan whose partitions have been pruned" sounds a bit off; maybe, "a
> plan in which partitions have been pruned".

I wrote:

"[...] a plan for which partitions have been pruned and for which they have
not."

Cheers,
Justin
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2cd0b8a..bae2589 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2967,70 +2967,70 @@ VALUES ('Albany', NULL, NULL, 'NY');

 

 It is not possible to turn a regular table into a partitioned table or
 vice versa.  However, it is possible to add a regular or partitioned table
 containing data as a partition of a partitioned table, or remove a
 partition from a partitioned table turning it into a standalone table;
 see  to learn more about the
 ATTACH PARTITION and DETACH PARTITION
 sub-commands.

 

 Individual partitions are linked to the partitioned table with inheritance
-behind-the-scenes; however, it is not possible to use some of the
-inheritance features discussed in the previous section with partitioned
-tables and partitions.  For example, a partition cannot have any parents
+behind-the-scenes; however, it is not possible to use some of the generic
+features of inheritance (discussed below) with declaratively partitioned
+tables or their partitions For example, a partition cannot have any parents
 other than the partitioned table it is a partition of, nor can a regular
-table inherit from a partitioned table making the latter its parent.
-That means partitioned tables and partitions do not participate in
-inheritance with regular tables.  Since a partition hierarchy consisting
-of the partitioned table and its partitions is still an inheritance
-hierarchy, all the normal rules of inheritance apply as described in
+table inherit from a partitioned table making the latter its parent.  That
+means partitioned tables and partitions do not participate in inheritance
+with regular tables.  Since a partition hierarchy consisting of the
+partitioned table and its partitions is still an inheritance hierarchy, all
+the normal rules of inheritance apply as described in
  with some exceptions, most notably:
 
 
  
   
Both CHECK and NOT NULL
constraints of a partitioned table are always inherited by all its
partitions.  CHECK constraints that are marked
NO INHERIT are not allowed to be created on
partitioned tables.
   
  
 
  
   
Using ONLY to add or drop a constraint on only the
partitioned table is supported when there are no partitions.  Once
partitions exist, using ONLY will result in an error
as adding or dropping constraints on only the partitioned table, when
-   partitions exist, is not supported.  Instead, constraints can be added
-   or dropped, when they are not present in the parent table, directly on
-   the partitions.  As a partitioned table does not have any data
-   directly, attempts to use TRUNCATE
+   partitions exist, is not supported.  Instead, constraints on the
+   partitions themselves can be added and (if they are not present in the
+   parent table) dropped.  As a partitioned table does not
+   have any data directly, attempts to use TRUNCATE
ONLY on a partitioned table will always return an
error.
   
  
 
  
   
Partitions cannot have columns that are not present in the parent.  It
-   is neither possible to specify columns when creating partitions with
-   CREATE TABLE nor is it possible to add columns to
+   is not possible to specify columns when creating partitions with
+   CREATE TABLE, nor is it possible to add columns to
partitions after-the-fact using ALTER TABLE.  Tables 
may be
added as a partition with ALTER TABLE ... ATTACH 
PARTITION
only if their columns exactly match the parent, including any
oid column.
   
  
 
  
   
You cannot drop the NOT NULL constraint on a
partition's column if the constraint is present in the parent table.
   
  
 
@@ -3347,37 +3347,37 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
on individual partitions, not the partitioned table.
   
  
 
 
 

 

 Implementation Using Inheritance
 
  While the built-in declarative partitioning is suitable for most
  common use cases, there are some circumstances 

Re: documentation fixes for partition pruning, round two

2018-05-23 Thread Justin Pryzby
On Thu, May 24, 2018 at 10:46:38AM +1200, David Rowley wrote:
> On 24 May 2018 at 09:35, Justin Pryzby <pry...@telsasoft.com> wrote:
> > On Fri, May 18, 2018 at 08:56:53PM -0500, Justin Pryzby wrote:
> >> I reread this and have some more comments.
> >> https://www.postgresql.org/docs/devel/static/ddl-partitioning.html
> >
> >> Let me know if it's useful to provide a patch.
> >
> > I propose this.
> 
> Thanks for working on this.
> 
> Can you just attach the patch?

Attached.

Justin
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 2cd0b8a..31f3438 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -2967,70 +2967,70 @@ VALUES ('Albany', NULL, NULL, 'NY');

 

 It is not possible to turn a regular table into a partitioned table or
 vice versa.  However, it is possible to add a regular or partitioned table
 containing data as a partition of a partitioned table, or remove a
 partition from a partitioned table turning it into a standalone table;
 see  to learn more about the
 ATTACH PARTITION and DETACH PARTITION
 sub-commands.

 

 Individual partitions are linked to the partitioned table with inheritance
-behind-the-scenes; however, it is not possible to use some of the
-inheritance features discussed in the previous section with partitioned
-tables and partitions.  For example, a partition cannot have any parents
+behind-the-scenes; however, it is not possible to use some of the generic
+features of inheritance (discussed below) with declaratively partitioned
+tables or their partitions For example, a partition cannot have any parents
 other than the partitioned table it is a partition of, nor can a regular
-table inherit from a partitioned table making the latter its parent.
-That means partitioned tables and partitions do not participate in
-inheritance with regular tables.  Since a partition hierarchy consisting
-of the partitioned table and its partitions is still an inheritance
-hierarchy, all the normal rules of inheritance apply as described in
+table inherit from a partitioned table making the latter its parent.  That
+means partitioned tables and partitions do not participate in inheritance
+with regular tables.  Since a partition hierarchy consisting of the
+partitioned table and its partitions is still an inheritance hierarchy, all
+the normal rules of inheritance apply as described in
  with some exceptions, most notably:
 
 
  
   
Both CHECK and NOT NULL
constraints of a partitioned table are always inherited by all its
partitions.  CHECK constraints that are marked
NO INHERIT are not allowed to be created on
partitioned tables.
   
  
 
  
   
Using ONLY to add or drop a constraint on only the
partitioned table is supported when there are no partitions.  Once
partitions exist, using ONLY will result in an error
as adding or dropping constraints on only the partitioned table, when
-   partitions exist, is not supported.  Instead, constraints can be added
-   or dropped, when they are not present in the parent table, directly on
-   the partitions.  As a partitioned table does not have any data
-   directly, attempts to use TRUNCATE
+   partitions exist, is not supported.  Instead, constraints on the
+   partitions themselves can be added and (if they are not present in the
+   parent table) dropped.  As a partitioned table does not
+   have any data directly, attempts to use TRUNCATE
ONLY on a partitioned table will always return an
error.
   
  
 
  
   
Partitions cannot have columns that are not present in the parent.  It
-   is neither possible to specify columns when creating partitions with
-   CREATE TABLE nor is it possible to add columns to
+   is not possible to specify columns when creating partitions with
+   CREATE TABLE, nor is it possible to add columns to
partitions after-the-fact using ALTER TABLE.  Tables 
may be
added as a partition with ALTER TABLE ... ATTACH 
PARTITION
only if their columns exactly match the parent, including any
oid column.
   
  
 
  
   
You cannot drop the NOT NULL constraint on a
partition's column if the constraint is present in the parent table.
   
  
 
@@ -3347,37 +3347,37 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
on individual partitions, not the partitioned table.
   
  
 
 
 

 

 Implementation Using Inheritance
 
  While the built-in declarative partitioning is suitable for most
  common use cases, there are some circumstances where a more flexible
  approach may be useful.  Par

Re: Performance regression with PostgreSQL 11 and partitioning

2018-05-25 Thread Justin Pryzby
On Fri, May 25, 2018 at 05:17:12PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Fri, May 25, 2018 at 1:53 PM, Amit Langote  
> > wrote:
> >> Seems here that we call find_appinfos_by_relids here for *all*
> >> partitions, even if all but one partition may have been pruned.  I
> >> haven't studied this code in detail, but I suspect it might be
> >> unnecessary, although I might be wrong.
> 
> > Uggh.  It might be possible to skip it for dummy children.  That would
> > leave the dummy child rels generating a different pathtarget than the
> > non-dummy children, but I guess if we never use them again maybe it
> > wouldn't matter.

> Maybe it's all right to decide that this rejiggering can be left
> for v12 ... did we promise anyone that it's now sane to use thousands
> of partitions?

https://www.postgresql.org/docs/devel/static/ddl-partitioning.html
|The following caveats apply to CONSTRAINT EXCLUSION:
|[...]
|All constraints on all partitions of the master table are examined during
|constraint exclusion, so large numbers of partitions are likely to increase
|query planning time considerably. So the legacy inheritance based partitioning
|will work well with up to perhaps a hundred partitions; DON'T TRY TO USE MANY
|THOUSANDS OF PARTITIONS.

It doesn't matter for us, as we're already using tooo many partitions, but I
would interpret that to mean that thousands of partitions are a job exclusively
for "partition pruning" of declarative partitions.

Justin



add default parallel query to v10 release notes? (Re: [PERFORM] performance drop after upgrade (9.6 > 10))

2018-05-24 Thread Justin Pryzby
Moving to -hackers;

On Sun, Jan 28, 2018 at 06:53:10PM -0500, Bruce Momjian wrote:
> On Thu, Oct 26, 2017 at 02:45:15PM -0500, Justin Pryzby wrote:
> > Is it because max_parallel_workers_per_gather now defaults to 2 ?
> > 
> > BTW, I would tentatively expect a change in default to be documented in the
> > release notes but can't see that it's.
> > 77cd477c4ba885cfa1ba67beaa82e06f2e182b85
> 
> Oops, you are correct.  The PG 10 release notes, which I wrote, should
> have mentioned this.  :-(

I just saw your January response to my October mail..

Maybe it's silly to update PG10 notes 9 months after release..
..but, any reason not to add to v10 release notes now (I don't know if the web
docs would be updated until the next point release?)

Justin



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-05-18 Thread Justin Pryzby
I reread this and have some more comments.
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html

"however, it is not possible to use some of the inheritance features discussed
in the previous section with partitioned tables and partitions"

=> The referenced section now follows rather than precedes the text; I suggest
to say:
"however, it is not possible to use some features of inheritance (discussed
below) with declaratively partitioned tables or their partitions"

"It is neither possible to specify columns when creating partitions with CREATE
TABLE nor is it possible to add columns to partitions after-the-fact using
ALTER TABLE"
=> change to: "It is not possible .. nor is it possible .."

Immediately after the section in inheritence:
https://www.postgresql.org/docs/devel/static/ddl-partitioning.html#DDL-PARTITION-PRUNING
"Partition pruning is a query optimization technique that improves performance
for partitioned tables"
=> I think should say "improves performance for DECLARATIVELY partitioned
tables"

"You can use the EXPLAIN command to show the difference between a plan whose
partitions have been pruned from one whose partitions haven't, by using the
enable_partition_pruning configuration parameter. A typical unoptimized plan
for this type of table setup is:"
=> should say "difference between .. AND", not FROM.

=> Also, should avoid repeating "use...using".  Also, remove the comma or
rearrange the sentence:
"By using the EXPLAIN command and the enable_partition_pruning configuration
parameter, one can show the difference between a plan whose partitions have
been pruned from one whose partitions haven't.

"Constraint exclusion is only applied during query planning; it is not applied
at execution time like partition pruning does."

=> Remove "does" ?

"Partitioning enforces a rule that all partitions must have exactly the same
set of columns as the parent"

=> I think should say "Declarative partitioning enforces"; or maybe:
"Partitions of a partitioned table must have exactly the same set of columns as
the parent"
or:
"For declarative partitioning, partitions must have exactly the same set of
columns as the partitioned table"

Let me know if it's useful to provide a patch.

Justin



Re: adding tab completions

2018-06-09 Thread Justin Pryzby
Thanks for review and comment.

On Tue, Jun 05, 2018 at 05:29:42PM +0300, Arthur Zakirov wrote:
> On Sun, Jun 03, 2018 at 10:39:22PM -0500, Justin Pryzby wrote:

> > > Also I think it could be good to list column names after parentheses,
> > > but I'm not sure if it easy to implement.
> > I tried this and nearly gave up, but see attached.
> 
> After some thought now I think that this is not so good idea. The code
> doesn't look good too. It doesn't worth it and sorry for the distraction.

No problem.

> Moreover there is no such completion for example for the command (it shows
> only first column):
> 
> CREATE INDEX ON test (

Noted (I misunderstood at first: you just mean there's precedent that column
names aren't completed, except maybe the first).

I can revise patch to not complete attributes in analyze; but, I think that
means that this will have to complete to table names:

postgres=# ANALYZE tt (a , 
alu_enodeb_201601 information_schema.
alu_enodeb_201602 jrn_pg_buffercache
...

.. since, without checking for matching parens, it has no idea whether to
complete with rels or atts.  WDYT?

Note that HeadMatch functions have special behavior with matching parenthesis:
if previous char is an right parenthesis, then the "previous word" is taken to
be the entire parenthesized value.  Maybe there's more that I don't know, but I
can't see how that can be easily applied here (by easily I mean without doing
something like making a temp copy of the buffer and inserting an "exploratory"
right parenthesis to see if TailMatches(prev_wd, ')') && prev_wd[0]=='(' or
similar).

An alternative is that only the first table is completed for vacuum/analyze.

CREATE INDEX should complete columns, too.  It has the "ON" keyword (which
makes trying to parse less fragile).

> > -   "SERVER", "INDEX", "LANGUAGE", "POLICY", "PUBLICATION", 
> > "RULE",
> > +   "SERVER", "INDEX", "LANGUAGE", "POLICY", "PUBLICATION",
> 
> Is this a typo? I think still there is a possibility to comment rules.

Not in PG11(b1) (note, that's a custom table)
postgres=# COMMENT ON RULE pg_settings_u IS 'asdf';
ERROR:  syntax error at or near "IS"

I see I didn't include that description in my first mail.

Here's a description and possible commit message for the (still WIP) patch:

Update psql tab completion for commits:

Table completion for ANALYZE partitioned_table
3c3bb99330aa9b4c2f6258bfa0265d806bf365c3

Add parenthesized options syntax for ANALYZE.
854dd8cff523bc17972d34772b0e39ad3d6d46a4

Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
ede62e56fbe809baa1a7bc3873d82f12ffe7540b

Allow multiple tables to be specified in one VACUUM or ANALYZE command.
11d8d72c27a64ea4e30adce11cf6c4f3dd3e60db

Remove deprecated COMMENT ON RULE syntax
e8d016d81940e75c126aa52971b7903b7301002e

Add hash partitioning.
1aba8e651ac3e37e1d2d875842de1e0ed22a651e

Parameter toast_tuple_target
c2513365a0a85e77d3c21adb92fe12cfbe0d1897

Parenthesized explain (...)
d4382c4ae7ea1e272f4fee388aac8ff99421471a

Parameter toast_tuple_target controls TOAST for new rows
c2513365a0a85e77d3c21adb92fe12cfbe0d1897

no longer accepts VACUUM ANALYZE VERBOSE
921059bd66c7fb1230c705d3b1a65940800c4cbb

> > else if (HeadMatches1("EXPLAIN") && previous_words_count==2 && 
> > prev_wd[0]=='(' && ends_with(prev_wd, ')'))
> 
> I think this condition can be replaced by:
> 
> else if (TailMatches2("EXPLAIN", MatchAny) && ends_with(prev_wd, ')'))
> 
> It looks better to me. Such condition is used for other commands and
> works the same way.
> 
> The last point I've noticed, there is no VERBOSE entry after VACUUM FULL
> ANALYZE command anymore.

See commit 921059bd6, above (it's not 100% clear to me that's intended to
reject VACUUM ANALYZE VERBOSE and not just reject VACUUM VERBOSE ANALYZE
VERBOSE, but I tentatively assume it's intentional).

> I'm not sure how this patch should be commited. Can it be commited
> outside the commitfest? Otherwise add it to the next commitfest please
> in order not to forget it.

I've done https://commitfest.postgresql.org/18/1661/

Justin



Re: Postgres 11 release notes

2018-06-09 Thread Justin Pryzby
On Sat, Jun 09, 2018 at 01:35:19PM -0700, David G. Johnston wrote:
> On Fri, May 11, 2018 at 8:08 AM, Bruce Momjian  wrote:
> 
> > I have committed the first draft of the Postgres 11 release notes.  I
> > will add more markup soon.  You can view the most current version here:
> >
> > http://momjian.us/pgsql_docs/release-11.html
> 
> """
> Also, if any table mentioned in VACUUM uses a column list, then ANALYZE
> keyword must be supplied; previously ANALYZE was implied in such cases.
> """
> 
> Should that be mentioned in the compatibility notes?

+1

Note also from 921059bd66c7fb1230c705d3b1a65940800c4cbb 


This is okay in v10 but rejected in v11b1:  

  
postgres=# VACUUM ANALYZE VERBOSE t;

  
ERROR:  syntax error at or near "VERBOSE"   

  

Justin



Re: BUG #15237: I got "ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"

2018-06-14 Thread Justin Pryzby
On Wed, Jun 13, 2018 at 05:48:50AM +0100, Andrew Gierth wrote:
> It then apparently went unnoticed until after the release of pg 10, at
> which point it got retroactively documented (in the release notes and
> nowhere else), in response to a brief discussion of a user complaint
> that happened on -general and not -hackers (or even -bugs).

Actually I noticed and pointed out during beta;
https://www.postgresql.org/message-id/flat/20170719174507.GA19616%40telsasoft.com#20170719174507.ga19...@telsasoft.com

Justin



adding tab completions

2018-05-28 Thread Justin Pryzby
Find attached tab completion for the following:

"... Also, recursively perform VACUUM and ANALYZE on partitions when the
command is applied to a partitioned table."
3c3bb99330aa9b4c2f6258bfa0265d806bf365c3

Add parenthesized options syntax for ANALYZE.
854dd8cff523bc17972d34772b0e39ad3d6d46a4

Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
ede62e56fbe809baa1a7bc3873d82f12ffe7540b

Allow multiple tables to be specified in one VACUUM or ANALYZE command.
11d8d72c27a64ea4e30adce11cf6c4f3dd3e60db
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7bb47ea..2d7b264 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -717,6 +717,26 @@ static const SchemaQuery Query_for_list_of_tmf = {
NULL
 };
 
+static const SchemaQuery Query_for_list_of_tpmf = {
+   /* min_server_version */
+   0,
+   /* catname */
+   "pg_catalog.pg_class c",
+   /* selcondition */
+   "c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+   CppAsString2(RELKIND_MATVIEW) ", "
+   CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+   CppAsString2(RELKIND_FOREIGN_TABLE) ")",
+   /* viscondition */
+   "pg_catalog.pg_table_is_visible(c.oid)",
+   /* namespace */
+   "c.relnamespace",
+   /* result */
+   "pg_catalog.quote_ident(c.relname)",
+   /* qualresult */
+   NULL
+};
+
 static const SchemaQuery Query_for_list_of_tpm = {
/* min_server_version */
0,
@@ -3563,33 +3583,41 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_CONST("OPTIONS");
 
 /*
- * VACUUM [ FULL | FREEZE ] [ VERBOSE ] [ table ]
- * VACUUM [ FULL | FREEZE ] [ VERBOSE ] ANALYZE [ table [ (column [, ...] ) ] ]
+ * VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
+ * VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, 
...] ]
  */
else if (Matches1("VACUUM"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION 
SELECT 'FULL'"
   " UNION 
SELECT 'FREEZE'"
   " UNION 
SELECT 'ANALYZE'"
+  " UNION 
SELECT '('"
   " UNION 
SELECT 'VERBOSE'");
-   else if (Matches2("VACUUM", "FULL|FREEZE"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   else if (Matches2("VACUUM", "FULL"))
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
+  " UNION 
SELECT 'FREEZE'"
   " UNION 
SELECT 'ANALYZE'"
   " UNION 
SELECT 'VERBOSE'");
-   else if (Matches3("VACUUM", "FULL|FREEZE", "ANALYZE"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   else if (Matches3("VACUUM", "FULL|FREEZE|FULL FREEZE", "ANALYZE"))
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION 
SELECT 'VERBOSE'");
-   else if (Matches3("VACUUM", "FULL|FREEZE", "VERBOSE"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   else if (Matches3("VACUUM", "FULL|FREEZE|FULL FREEZE", "VERBOSE"))
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION 
SELECT 'ANALYZE'");
else if (Matches2("VACUUM", "VERBOSE"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION 
SELECT 'ANALYZE'");
else if (Matches2("VACUUM", "ANALYZE"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf,
   " UNION 
SELECT 'VERBOSE'");
+   else if (HeadMatches2("VACUUM", "(") && !ends_with(prev_wd, ',') && 
!ends_with(prev_wd, '('))
+   COMPLETE_WITH_LIST2(",", ")");
+   else if (HeadMatches2("VACUUM", "(") && !ends_with(prev_wd, ')'))
+   COMPLETE_WITH_LIST5("FULL", "FREEZE", "ANALYZE", 
"VERBOSE", "DISABLE_PAGE_SKIPPING" );
+   else if (HeadMatches1("VACUUM") && !ends_with(prev_wd, ',') && 
!ends_with(prev_wd, ')'))
+   COMPLETE_WITH_CONST(",");
else if (HeadMatches1("VACUUM"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm, NULL);
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpmf, 

Re: "column i.indnkeyatts does not exist" in pg_upgrade from 11dev to 11b1

2018-05-29 Thread Justin Pryzby
On Tue, May 29, 2018 at 02:00:20PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > I've used pg_upgrade like this before, but maybe from a different (recent)
> > 11dev HEAD; I found: "pg_upgrade supports upgrades from 8.4.X and later to 
> > the
> > current major release of PostgreSQL, including snapshot and beta releases."
> > (But maybe upgrades FROM beta releases aren't supported in the general 
> > case?)
> 
> Yeah, that :-(.  pg_dump's approach to cross-version catalog differences
> can only cope with differences between major versions.  So if it sees
> a server that calls itself 11-something it's going to think that means
> the current catalog layout.  There's no good way to deal with pre-beta
> snapshot versions, other than to dump with a pg_dump of the same vintage.

Thanks for confirming.

In this case I worked around it by doing:
 sudo ln -sfv /usr/pgsql-11{dev0,b1}/bin/pg_dump
 sudo ln -sfv /usr/pgsql-11{dev0,b1}/bin/pg_dumpall

I guess, if need be, pg_dump could look at CATALOG_VERSION..

Justin



Re: adding tab completions

2018-06-03 Thread Justin Pryzby
I finally got back to this; thanks everyone for reviews;

I also added completion for parenthesized explain (...)
d4382c4ae7ea1e272f4fee388aac8ff99421471a

and for ALTER TABLE SET (toast_tuple_target).
c2513365a0a85e77d3c21adb92fe12cfbe0d1897

BTW..should that be toast.tuple_target ??

Note that this also tries to comply with 
921059bd66c7fb1230c705d3b1a65940800c4cbb
This is okay in v10 but rejected in v11b1:
postgres=# VACUUM ANALYZE VERBOSE t;
ERROR:  syntax error at or near "VERBOSE"

On Tue, May 29, 2018 at 10:12:27AM -0400, Alvaro Herrera wrote:
> Is it just me that finds shocking the notion of vacuuming a foreign
> table?

Fixed, thanks.

On Tue, May 29, 2018 at 12:27:27PM +0300, Arthur Zakirov wrote:
> The patch replaces the variable Query_for_list_of_tmf by
> Query_for_list_of_tpmf. So I think Query_for_list_of_tmf isn't needed
> anymore and can be removed.

Fixed by adding relkind='p' to the existing list rather than making a new
query entry.

> Also I think it could be good to list column names after parentheses,
> but I'm not sure if it easy to implement.

I tried this and nearly gave up, but see attached.

The complication is that table names for which to pull the attributes isn't at a
constant offset; for example:
vacuum analyze a(TAB => needs prev2_wd
but
vacuum analyze a(c, TAB => needs prev3_wd

Maybe that's too much effort to include in tab completion.  If so, the
"prev_parens" stanza can be removed from VACUUM and ANALYZE.  If it's desired
to keep it, I guess I should avoid the duplicated lines.

On Wed, May 30, 2018 at 12:45:30AM +1200, Edmund Horner wrote:
> I don't believe it's meaningful to match on words with spaces in them,

Excellent point...fixed.

Find attached v6 (there were some "unpublished" versions).

Justin
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7bb47ea..13d79f8 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -705,6 +705,7 @@ static const SchemaQuery Query_for_list_of_tmf = {
"pg_catalog.pg_class c",
/* selcondition */
"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+   CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
CppAsString2(RELKIND_MATVIEW) ", "
CppAsString2(RELKIND_FOREIGN_TABLE) ")",
/* viscondition */
@@ -1282,6 +1283,7 @@ static PGresult *exec_query(const char *query);
 static char **get_previous_words(int point, char **buffer, int *nwords);
 
 static char *get_guctype(const char *varname);
+static int prev_parens(int nwords, char **words);
 
 #ifdef NOT_USED
 static char *quote_file_name(char *text, int match_type, char *quote_pointer);
@@ -,6 +2224,7 @@ psql_completion(const char *text, int start, int end)
"fillfactor",
"parallel_workers",
"log_autovacuum_min_duration",
+   "toast_tuple_target",
"toast.autovacuum_enabled",
"toast.autovacuum_freeze_max_age",
"toast.autovacuum_freeze_min_age",
@@ -2385,7 +2388,7 @@ psql_completion(const char *text, int start, int end)
{"ACCESS METHOD", "CAST", "COLLATION", "CONVERSION", "DATABASE",
"EVENT TRIGGER", "EXTENSION",
"FOREIGN DATA WRAPPER", "FOREIGN TABLE",
-   "SERVER", "INDEX", "LANGUAGE", "POLICY", "PUBLICATION", 
"RULE",
+   "SERVER", "INDEX", "LANGUAGE", "POLICY", "PUBLICATION",
"SCHEMA", "SEQUENCE", "STATISTICS", "SUBSCRIPTION",
"TABLE", "TYPE", "VIEW", "MATERIALIZED VIEW", "COLUMN", 
"AGGREGATE", "FUNCTION",
"PROCEDURE", "ROUTINE",
@@ -2703,7 +2706,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_LIST2("TABLE", "MATERIALIZED VIEW");
/* Complete PARTITION BY with RANGE ( or LIST ( or ... */
else if (TailMatches2("PARTITION", "BY"))
-   COMPLETE_WITH_LIST2("RANGE (", "LIST (");
+   COMPLETE_WITH_LIST3("RANGE (", "LIST (", "HASH (");
/* If we have xxx PARTITION OF, provide a list of partitioned tables */
else if (TailMatches2("PARTITION", "OF"))

COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, "");
@@ -2996,14 +2999,21 @@ psql_completion(const char *text, int start, int end)
else if (Matches1("EXECUTE"))
COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements);
 
-/* EXPLAIN */
-
-   /*
-* Complete EXPLAIN [ANALYZE] [VERBOSE] with list of EXPLAIN-able 
commands
-*/
+/* 
+ * EXPLAIN [ ( option [, ...] ) ] statement
+ * EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
+ */
else if (Matches1("EXPLAIN"))
-   COMPLETE_WITH_LIST7("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
-   "ANALYZE", "VERBOSE");
+ 

Re: adding tab completions

2018-06-03 Thread Justin Pryzby
Find attached an update which also supports column completion using the legacy
non-parenthesized syntax.
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7bb47ea..699a102 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -705,6 +705,7 @@ static const SchemaQuery Query_for_list_of_tmf = {
"pg_catalog.pg_class c",
/* selcondition */
"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+   CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
CppAsString2(RELKIND_MATVIEW) ", "
CppAsString2(RELKIND_FOREIGN_TABLE) ")",
/* viscondition */
@@ -1282,6 +1283,7 @@ static PGresult *exec_query(const char *query);
 static char **get_previous_words(int point, char **buffer, int *nwords);
 
 static char *get_guctype(const char *varname);
+static int prev_parens(int nwords, char **words);
 
 #ifdef NOT_USED
 static char *quote_file_name(char *text, int match_type, char *quote_pointer);
@@ -,6 +2224,7 @@ psql_completion(const char *text, int start, int end)
"fillfactor",
"parallel_workers",
"log_autovacuum_min_duration",
+   "toast_tuple_target",
"toast.autovacuum_enabled",
"toast.autovacuum_freeze_max_age",
"toast.autovacuum_freeze_min_age",
@@ -2385,7 +2388,7 @@ psql_completion(const char *text, int start, int end)
{"ACCESS METHOD", "CAST", "COLLATION", "CONVERSION", "DATABASE",
"EVENT TRIGGER", "EXTENSION",
"FOREIGN DATA WRAPPER", "FOREIGN TABLE",
-   "SERVER", "INDEX", "LANGUAGE", "POLICY", "PUBLICATION", 
"RULE",
+   "SERVER", "INDEX", "LANGUAGE", "POLICY", "PUBLICATION",
"SCHEMA", "SEQUENCE", "STATISTICS", "SUBSCRIPTION",
"TABLE", "TYPE", "VIEW", "MATERIALIZED VIEW", "COLUMN", 
"AGGREGATE", "FUNCTION",
"PROCEDURE", "ROUTINE",
@@ -2703,7 +2706,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_LIST2("TABLE", "MATERIALIZED VIEW");
/* Complete PARTITION BY with RANGE ( or LIST ( or ... */
else if (TailMatches2("PARTITION", "BY"))
-   COMPLETE_WITH_LIST2("RANGE (", "LIST (");
+   COMPLETE_WITH_LIST3("RANGE (", "LIST (", "HASH (");
/* If we have xxx PARTITION OF, provide a list of partitioned tables */
else if (TailMatches2("PARTITION", "OF"))

COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, "");
@@ -2996,14 +2999,21 @@ psql_completion(const char *text, int start, int end)
else if (Matches1("EXECUTE"))
COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements);
 
-/* EXPLAIN */
-
-   /*
-* Complete EXPLAIN [ANALYZE] [VERBOSE] with list of EXPLAIN-able 
commands
-*/
+/* 
+ * EXPLAIN [ ( option [, ...] ) ] statement
+ * EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
+ */
else if (Matches1("EXPLAIN"))
-   COMPLETE_WITH_LIST7("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
-   "ANALYZE", "VERBOSE");
+   COMPLETE_WITH_LIST8("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
+   "ANALYZE", "VERBOSE", 
"(");
+   else if (HeadMatches2("EXPLAIN", "("))
+   if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+   COMPLETE_WITH_LIST7("ANALYZE", "VERBOSE", "COSTS", 
"BUFFERS", "TIMING", "SUMMARY", "FORMAT");
+   else
+   COMPLETE_WITH_LIST4(",", ")", "ON", "OFF");
+   else if (HeadMatches1("EXPLAIN") && previous_words_count==2 && 
prev_wd[0]=='(' && ends_with(prev_wd, ')'))
+   /* If the parenthesis are balanced, the list is apparently 
parsed as a single word */
+   COMPLETE_WITH_LIST5("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE");
else if (Matches2("EXPLAIN", "ANALYZE"))
COMPLETE_WITH_LIST6("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
"VERBOSE");
@@ -3563,33 +3573,57 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_CONST("OPTIONS");
 
 /*
- * VACUUM [ FULL | FREEZE ] [ VERBOSE ] [ table ]
- * VACUUM [ FULL | FREEZE ] [ VERBOSE ] ANALYZE [ table [ (column [, ...] ) ] ]
+ * VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ]
+ * VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, 
...] ]
  */
else if (Matches1("VACUUM"))
-   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tm,
+   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tpm,
   " UNION 
SELECT 

Re: documentation fixes for partition pruning, round three

2018-06-01 Thread Justin Pryzby
On Fri, Jun 01, 2018 at 03:00:10PM -0400, Alvaro Herrera wrote:
> On 2018-May-23, Justin Pryzby wrote:
> 
> > There's two other, wider changes to consider:
...
> 
> >  - should we find a unified term for "inheritence-based partitioning" and 
> > avoid
> >using the word "partitioning" in that context?  For example: 
> > "Partitioning
> >can be implemented using table inheritance[...]".  One possible phrase
> >currently begin used is: "legacy inheritance method".
> 
> Yeah, maybe it'd be a good time to do that.  In particular I wondered
> whether the section title "Partitioning and Constraint Exclusion" should
> be changed somehow to note the fact that it's mostly for the legacy
> method.

I made changes to avoid "partition" (which I think should mean a child of
relkind='p', and itself of relkind='r') and "partitioned" (meaning relkind='p'
itself) but left alone most instances of "partitioning".

There's two issues.  One is the unfortunately-named, recommended setting of
constraint_exclusion='partition' :(

And one is this, which I think should be disambiguated from native
list/range/hash partition bounds:

  Use simple equality conditions for list partitioning, or simple range
  tests for range partitioning, as illustrated in the preceding examples.

I'm short on words so maybe someone else can recommend language.

On Fri, Jun 01, 2018 at 02:57:22PM -0400, Alvaro Herrera wrote:
> Pushed.  I made a couple of minor changes, in particular I added the
It looks like you also fixed a double negative - thanks.

Justin
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 0258391..d919818 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3387,8 +3387,8 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 Declarative partitioning only supports range, list and hash
 partitioning, whereas table inheritance allows data to be divided in a
 manner of the user's choosing.  (Note, however, that if constraint
-exclusion is unable to prune partitions effectively, query performance
-will be very poor.)
+exclusion is unable to prune child tables effectively, query 
performance
+may be poor.)

   
 
@@ -3410,18 +3410,18 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
 
  
   We use the same measurement table we used
-  above.  To implement it as a partitioned table using inheritance, use
+  above.  To implement partitioning using inheritance, use
   the following steps:
 
   

 
  Create the master table, from which all of the
- partitions will inherit.  This table will contain no data.  Do not
+ child tables will inherit.  This table will contain no 
data.  Do not
  define any check constraints on this table, unless you intend them
- to be applied equally to all partitions.  There is no point in
+ to be applied equally to all child tables.  There is no point in
  defining any indexes or unique constraints on it, either.  For our
- example, master table is the measurement
+ example, the master table is the measurement
  table as originally defined.
 

@@ -3431,7 +3431,7 @@ ALTER TABLE measurement ATTACH PARTITION 
measurement_y2008m02
  Create several child tables that each inherit from
  the master table.  Normally, these tables will not add any columns
  to the set inherited from the master.  Just as with declarative
- partitioning, these partitions are in every way normal
+ partitioning, these tables are in every way normal
  PostgreSQL tables (or foreign tables).
 
 
@@ -3449,8 +3449,8 @@ CREATE TABLE measurement_y2008m01 () INHERITS 
(measurement);
 

 
- Add non-overlapping table constraints to the partition tables to
- define the allowed key values in each partition.
+ Add non-overlapping table constraints to the child tables to
+ define the allowed key values in each.
 
 
 
@@ -3461,18 +3461,18 @@ CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire', 
'Warwickshire' ))
 CHECK ( outletID = 100 AND outletID  200 )
 
  Ensure that the constraints guarantee that there is no overlap
- between the key values permitted in different partitions.  A common
+ between the key values permitted in different child tables.  A common
  mistake is to set up range constraints like:
 
 CHECK ( outletID BETWEEN 100 AND 200 )
 CHECK ( outletID BETWEEN 200 AND 300 )
 
- This is wrong since it is not clear which partition the key value
- 200 belongs in.
+ This is wrong since it is not clear into which child table the key
+ value 200 belo

"column i.indnkeyatts does not exist" in pg_upgrade from 11dev to 11b1

2018-05-29 Thread Justin Pryzby
I've used pg_upgrade like this before, but maybe from a different (recent)
11dev HEAD; I found: "pg_upgrade supports upgrades from 8.4.X and later to the
current major release of PostgreSQL, including snapshot and beta releases."
(But maybe upgrades FROM beta releases aren't supported in the general case?)

sudo -u postgres /usr/pgsql-11/bin/postgres -D /var/lib/pgsql/11dev0/data
< 2018-05-29 12:06:50.104 CDT  >FATAL:  database files are incompatible with 
server
< 2018-05-29 12:06:50.104 CDT  >DETAIL:  The database cluster was initialized 
with CATALOG_VERSION_NO 201803141, but the server was compiled with 
CATALOG_VERSION_NO 201804191.

sudo -u postgres sh -ec 'cd /var/lib/pgsql; /usr/pgsql-11/bin/pg_upgrade -b 
/usr/pgsql-11dev0/bin -B /usr/pgsql-11/bin -d /var/lib/pgsql/11dev0/data -D 
/var/lib/pgsql/11/data -j2 --link --retain'
...

command: "/usr/pgsql-11/bin/pg_dump" --host /var/lib/pgsql --port 50432 
--username postgres --schema-only --quote-all-identifiers --binary-upgrade 
--format=custom  --file="pg_upgrade_dump_16401.custom" 'dbname=c' >> 
"pg_upgrade_dump_16401.log" 2>&1
pg_dump: [archiver (db)] query failed: ERROR:  column i.indnkeyatts does not 
exist
LINE 1: ...atalog.pg_get_indexdef(i.indexrelid) AS indexdef, i.indnkeya...
 ^
HINT:  Perhaps you meant to reference the column "i.indnatts".
pg_dump: [archiver (db)] query was: SELECT t.tableoid, t.oid, t.relname AS 
indexname, inh.inhparent AS parentidx, pg_catalog.pg_get_indexdef(i.indexrelid) 
AS indexdef, i.indnkeyatts AS indnkeyatts, i.indnatts AS indnatts, i.indkey, 
i.indisclustered, i.indisreplident, t.relpages, c.contype, c.conname, 
c.condeferrable, c.condeferred, c.tableoid AS contableoid, c.oid AS conoid, 
pg_catalog.pg_get_constraintdef(c.oid, false) AS condef, (SELECT spcname FROM 
pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) AS tablespace, 
t.reloptions AS indreloptions FROM pg_catalog.pg_index i JOIN 
pg_catalog.pg_class t ON (t.oid = i.indexrelid) JOIN pg_catalog.pg_class t2 ON 
(t2.oid = i.indrelid) LEFT JOIN pg_catalog.pg_constraint c ON (i.indrelid = 
c.conrelid AND i.indexrelid = c.conindid AND c.contype IN ('p','u','x')) LEFT 
JOIN pg_catalog.pg_inherits inh ON (inh.inhrelid = indexrelid) WHERE i.indrelid 
= '19970'::pg_catalog.oid AND (i.indisvalid OR t2.relkind = 'p') AND 
i.indisready ORDER BY indexname



Re: \d t: ERROR: XX000: cache lookup failed for relation

2018-06-04 Thread Justin Pryzby
On Mon, Jun 04, 2018 at 07:12:53PM +0300, Teodor Sigaev wrote:
> 
> >Is that considered an actionable problem?
> 
> 
> I think so. but I'm not able to reproduce that, I wrote a script to simplify

The failure is triggered by running "\d t" in (yet) another session - sorry if
that was unclear.  It fails very consistently, probably over 75% of the time.

Also note that my "INSERT" was run in a separate loop, concurrent with the
VACUUM and ALTER, but yours is running consecutively.

Justin



\d t: ERROR: XX000: cache lookup failed for relation

2018-06-02 Thread Justin Pryzby
Resending to -hackers
https://www.postgresql.org/message-id/20180527022401.GA20949%40telsasoft.com

Is that considered an actionable problem?

Encountered consistently while trying to reproduce the vacuum full
pg_statistic/toast_2619 bug; while running a loop around VAC FULL and more in
another session:

[1]-  Running { time sh -ec 'while :; do psql --port 5678 
postgres -qc "VACUUM FULL pg_toast.pg_toast_2619"; psql --port 5678 postgres 
-qc "VACUUM FULL pg_statistic"; done'; date; } &
[2]+  Running time while :; do
psql postgres --port 5678 -c "INSERT INTO t SELECT i FROM 
generate_series(1,99) i"; sleep 1; for a in `seq 999`;
do  
psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE int USING 
i::int"; sleep 1; psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE 
bigint"; sleep 1;
done; psql postgres --port 5678 -c "TRUNCATE t"; sleep 1;
done &

$ psql --port 5678 postgres -x
psql (11beta1)
...
postgres=# \set VERBOSITY verbose 
postgres=# \d t
ERROR:  XX000: cache lookup failed for relation 8096742
LOCATION:  flatten_reloptions, ruleutils.c:11065

Justin



Re: \d t: ERROR: XX000: cache lookup failed for relation

2018-06-04 Thread Justin Pryzby
On Mon, Jun 04, 2018 at 08:01:41PM +0300, Teodor Sigaev wrote:
> >Also note that my "INSERT" was run in a separate loop, concurrent with the
> >VACUUM and ALTER, but yours is running consecutively.
> 
> both loops run in backgound. I tried to run two scripts -  and got a lot of
> deadlocks but not a probem reproduction.

Ah, I think this is the missing, essential component:
CREATE INDEX ON t(right(i::text,1)) WHERE i::text LIKE '%1';

I can reproduce it running just this loop:

time while :; do for a in `seq 999`; do psql postgres --port 5678 -c "ALTER 
TABLE t ALTER i TYPE int USING i::int"; done; done

Justin



Re: adding tab completions

2018-06-27 Thread Justin Pryzby
On Mon, Jun 11, 2018 at 11:35:51PM +0300, Arthur Zakirov wrote:
> IMHO, I'd leave the code as simple as possible. It is up to you of
> course. But it is easy to add completion for a first attribute, by
> adding the condition (and leave other attributes without completion):
> 
> else if (HeadMatches1("VACUUM") && TailMatches1("("))
> COMPLETE_WITH_ATTR(prev2_wd, "");

Thanks - I've done this in the attached.  It works well for having minimal
logic.

On Tue, Jun 05, 2018 at 05:29:42PM +0300, Arthur Zakirov wrote:
> On Sun, Jun 03, 2018 at 10:39:22PM -0500, Justin Pryzby wrote:
> > else if (HeadMatches1("EXPLAIN") && previous_words_count==2 && 
> > prev_wd[0]=='(' && ends_with(prev_wd, ')'))
> 
> I think this condition can be replaced by:
> 
> else if (TailMatches2("EXPLAIN", MatchAny) && ends_with(prev_wd, ')'))

I used:
else if (HeadMatches2("EXPLAIN", MatchAny) && ends_with(prev_wd, ')'))

> > I've done https://commitfest.postgresql.org/18/1661/
> 
> Thank you!

Thanks for your repeated reviews ; if this looks+works fine, please set to
R-F-C.

Actually..another thought: since toast tables may be VACUUM-ed, should I
introduce Query_for_list_of_tpmt ?

Update psql tab completion for commits: 

   


   
Table completion for ANALYZE partitioned_table
3c3bb99330aa9b4c2f6258bfa0265d806bf365c3

Add parenthesized options syntax for ANALYZE.
854dd8cff523bc17972d34772b0e39ad3d6d46a4

Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.
ede62e56fbe809baa1a7bc3873d82f12ffe7540b

Allow multiple tables to be specified in one VACUUM or ANALYZE command.
11d8d72c27a64ea4e30adce11cf6c4f3dd3e60db

Add hash partitioning.
1aba8e651ac3e37e1d2d875842de1e0ed22a651e

Parameter toast_tuple_target
c2513365a0a85e77d3c21adb92fe12cfbe0d1897

Parenthesized explain (...)
d4382c4ae7ea1e272f4fee388aac8ff99421471a

Parameter toast_tuple_target controls TOAST for new rows
c2513365a0a85e77d3c21adb92fe12cfbe0d1897

no longer accepts VACUUM ANALYZE VERBOSE
921059bd66c7fb1230c705d3b1a65940800c4cbb

Justin
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7bb47ea..e913b83 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -705,6 +705,7 @@ static const SchemaQuery Query_for_list_of_tmf = {
"pg_catalog.pg_class c",
/* selcondition */
"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+   CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
CppAsString2(RELKIND_MATVIEW) ", "
CppAsString2(RELKIND_FOREIGN_TABLE) ")",
/* viscondition */
@@ -,6 +2223,7 @@ psql_completion(const char *text, int start, int end)
"fillfactor",
"parallel_workers",
"log_autovacuum_min_duration",
+   "toast_tuple_target",
"toast.autovacuum_enabled",
"toast.autovacuum_freeze_max_age",
"toast.autovacuum_freeze_min_age",
@@ -2703,7 +2705,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_LIST2("TABLE", "MATERIALIZED VIEW");
/* Complete PARTITION BY with RANGE ( or LIST ( or ... */
else if (TailMatches2("PARTITION", "BY"))
-   COMPLETE_WITH_LIST2("RANGE (", "LIST (");
+   COMPLETE_WITH_LIST3("RANGE (", "LIST (", "HASH (");
/* If we have xxx PARTITION OF, provide a list of partitioned tables */
else if (TailMatches2("PARTITION", "OF"))

COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, "");
@@ -2996,14 +2998,21 @@ psql_completion(const char *text, int start, int end)
else if (Matches1("EXECUTE"))
COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements);
 
-/* EXPLAIN */
-
-   /*
-* Complete EXPLAIN [ANALYZE] [VERBOSE] with list of EXPLAIN-able 
commands
-*/
+/* 
+ * EXPLAIN [ ( option [, ...] ) ] statement
+ * EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
+ */
else if (Matches1("EXPLAIN"))
-   COMPLETE_WITH_LIST7("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
-   "ANALYZE

ENOSPC FailedAssertion("!(RefCountErrors == 0)"

2018-06-27 Thread Justin Pryzby
While grepping logs, I came across this crash, which I caused while adding many
indices in a test environment.  I don't know that there's any reason to believe
one way or the other if this is specific to running on pg11b1.

< 2018-06-17 11:38:45.465 CDT pryzbyj >FATAL:  the database system is in 
recovery mode
< 2018-06-17 11:38:45.466 CDT pryzbyj >FATAL:  the database system is in 
recovery mode
< 2018-06-17 11:38:45.468 CDT  >FATAL:  could not extend file 
"base/17379/38665798": No space left on device
< 2018-06-17 11:38:45.468 CDT  >HINT:  Check free disk space.
< 2018-06-17 11:38:45.468 CDT  >CONTEXT:  WAL redo at 2AA/239676B8 for 
Heap/INSERT+INIT: off 1
< 2018-06-17 11:38:45.468 CDT  >WARNING:  buffer refcount leak: [2366] 
(rel=base/17379/38665798, blockNum=1241, flags=0x8a00, refcount=1 1)
TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line: 2523)

Unfortunately, I have neither a core nor stacktrace to share, as the DB was
(until after running out of space) on the same FS as /var :(

Jun 17 11:38:42 localhost abrt[30526]: Aborting dump: only 30MiB is available 
on /var/spool/abrt
Jun 17 11:38:45 localhost abrt[1261]: Aborting dump: only 16MiB is available on 
/var/spool/abrt

Not sure how useful it's to crash test ENOSPC (although I see there's
continuing patches for this case).

Justin



Re: automatic restore point

2018-06-25 Thread Justin Pryzby
On Tue, Jun 26, 2018 at 12:04:59AM -0400, Rui DeSousa wrote:
> Why not use auto commit off in the session or .psqlrc file or begin and then 
> use rollback?  \set AUTOCOMMIT off
> 
> What would be nice is if a syntax error didn’t abort the transaction when 
> auto commit is off — being a bad typist.

I think you'll get that behavior with ON_ERROR_ROLLBACK.

Justin



pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string

2018-06-23 Thread Justin Pryzby
in pg10:

ts=# begin;VACUUM FULL pg_toast.pg_toast_2619;
BEGIN
ERROR:  25001: VACUUM cannot run inside a transaction block
LOCATION:  PreventTransactionChain, xact.c:3167
=> sounds fine

$ psql postgres -c 'SELECT 1; VACUUM pg_statistic'
ERROR:  VACUUM cannot be executed from a function or multi-command 
string
=> sounds fine

In pg11b1:

pryzbyj=# begin;VACUUM FULL pg_toast.pg_toast_2619;
BEGIN
ERROR:  25001: VACUUM cannot run inside a transaction block
LOCATION:  PreventInTransactionBlock, xact.c:3163
=> sounds fine

[pryzbyj@dev ~]$ psql -c 'SELECT 1; VACUUM pg_statistic'
ERROR:  VACUUM cannot run inside a transaction block
=> Error message seems off??

I couldn't figure out how to \set VERBOSITY verbose inside a psql command (??),
but strace shows for v10:
SERROR\0VERROR\0C25001\0MVACUUM cannot be executed from a function or 
multi-command string\0Fxact.c\0L3187\0RPreventTransactionChain

And for v11:
SERROR\0VERROR\0C25001\0MVACUUM cannot run inside a transaction 
block\0Fxact.c\0L3163\0RPreventInTransactionBlock

Function renamed by commit 04700b685f31508036456bea4d92533e5ceee9d6.

So behavior change maybe caused by 6eb52da3948dc8bc7c8a61cbacac14823b670c58 ?

Justin



Re: Name of main process differs between servers (postmaster vs postgres)

2018-06-26 Thread Justin Pryzby
On Tue, Jun 26, 2018 at 04:51:32PM -0500, Jonathan Lemig wrote:
> Hi,
> 
> I noticed on two of my postgres servers, one has "postmaster" for the main
> process, and the other has "postgres".   My question is - why is this?  For
> example:

On my centos6 servers:

1518140 lrwxrwxrwx   1 root root8 May 10 21:32 
/usr/pgsql-10/bin/postmaster -> postgres

Do you have something else ?

Is there a symlink for pg_ctl or postmaster or postgres in /s?bin, /usr/s?bin,
or /usr/local/s?bin ?

Was one started by initscript and one started by something else (manually
running pg_ctl or similar) ?

Are the initscripts the same?

Did one of them start at boot and one restarted since then ?

You might get a hint by comparing /proc/909/environ with /proc/4804/environ
(the relative pids suggests that 4804 was perhaps re/started significantly
after boot).  And or proc/pid/comm, exe and stat.

Did one of them crash since it was started (or killed by OOM) ?

Justin

PS, I think the list residents would for the future to ask to keep admin/DBA
oriented questions on pgsql-general, and development/bugs on -hackers.



doc fixes: vacuum_cleanup_index_scale_factor

2018-05-01 Thread Justin Pryzby
Introduced 857f9c36cda520030381bd8c2af20adf0ce0e1d4

The "minimal version" should probably be "minimum version", but I didn't
include it here.

Also, the documentation doesn't indicate the default value of -1 (or its
special meaning).

Also, my understanding of this feature changed when I read this logic:

if (cleanup_scale_factor < 0 ||
metad->btm_last_cleanup_num_heap_tuples < 0 ||
info->num_heap_tuples > (1.0 + cleanup_scale_factor) *
metad->btm_last_cleanup_num_heap_tuples)
result = true;

=> That means that _bt_vacuum_needs_cleanup() returns true unless a nondefault
value is set.  That feels wrong, since the doc said:

"When no tuples were deleted from the heap, B-tree indexes
might still be scanned during VACUUM
cleanup stage by two reasons."

Which sounds like "being scanned when no tuples were deleted" is exceptional
rather, than the default.

I changed that paragraph based on that understanding (which is consistent with
the commit message).  The language could probably be further improved, but
someone should first verify my tentative understanding of the intent.

Justin

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index eabe2a9..18c0ec0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1893,14 +1893,15 @@ include_dir 'conf.d'
   
   

-When no tuples were deleted from the heap, B-tree indexes might still
-be scanned during VACUUM cleanup stage by two
-reasons.  The first reason is that B-tree index contains deleted pages
-which can be recycled during cleanup.  The second reason is that B-tree
-index statistics is stalled.  The criterion of stalled index statistics
-is number of inserted tuples since previous statistics collection
-is greater than vacuum_cleanup_index_scale_factor
-fraction of total number of heap tuples.
+When no tuples were deleted from the heap, B-tree indexes are still
+scanned during VACUUM cleanup stage unless
+two conditions are met.  First, if a B-tree index contains no deleted 
pages
+which can be recycled during cleanup.  Second, if B-tree
+index statistics are not stale.  Index statistics are considered stale 
unless
+vacuum_cleanup_index_scale_factor is non-negative, 
and the
+number of inserted tuples since the previous statistics collection is
+less than that fraction of the total number of heap tuples.
+The default is -1, meaning index scan during cleanup is not skipped.

   
  
diff --git a/src/backend/access/nbtree/nbtpage.c 
b/src/backend/access/nbtree/nbtpage.c
index 3bcc56e..22b4a75 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -189,7 +189,7 @@ _bt_update_meta_cleanup_info(Relation rel, TransactionId 
oldestBtpoXact,
if (metad->btm_version < BTREE_VERSION)
_bt_upgrademetapage(metapg);
 
-   /* update cleanup-related infromation */
+   /* update cleanup-related information */
metad->btm_oldest_btpo_xact = oldestBtpoXact;
metad->btm_last_cleanup_num_heap_tuples = numHeapTuples;
MarkBufferDirty(metabuf);
diff --git a/src/backend/access/nbtree/nbtree.c 
b/src/backend/access/nbtree/nbtree.c
index e5dce00..4e86280 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -818,10 +818,10 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
float8  cleanup_scale_factor;
 
/*
-* If table receives large enough amount of insertions and no 
cleanup
-* was performed, then index might appear to have stalled 
statistics.
-* In order to evade that, we perform cleanup when table 
receives
-* vacuum_cleanup_index_scale_factor fractions of insertions.
+* If table receives enough insertions and no cleanup
+* was performed, then index would appear have stale statistics.
+* If scale factor is set, we avoid that by performing cleanup 
if the number of added tuples
+* exceeds vacuum_cleanup_index_scale_factor fraction of 
original tuple count.
 */
relopts = (StdRdOptions *) info->index->rd_options;
cleanup_scale_factor = (relopts &&
@@ -870,8 +870,8 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult 
*stats,
 );
 
/*
-* Update cleanup-related information in metapage. These 
information
-* is used only for cleanup but keeping up them to date can 
avoid
+* Update cleanup-related information in metapage. This 
information
+* is used only for cleanup but keeping them up to date can 
avoid
 * 

Re: Doc tweak for huge_pages?

2018-01-21 Thread Justin Pryzby
On Mon, Jan 22, 2018 at 07:10:33PM +1300, Thomas Munro wrote:
> On Mon, Jan 22, 2018 at 6:30 PM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > On Mon, Jan 22, 2018 at 03:54:26PM +1300, Thomas Munro wrote:
> >> On Fri, Jan 12, 2018 at 1:12 PM, Thomas Munro
> >> <thomas.mu...@enterprisedb.com> wrote:
> >> > On Tue, Jan 9, 2018 at 6:24 AM, Catalin Iacob <iacobcata...@gmail.com> 
> >> > wrote:
> >> > I don't know enough about this to make such a strong recommendation
> >> > myself, which is why I was only trying to report that bad performance
> >> > had been observed on some version, not that you shouldn't do it.  Any
> >> > other views on this stronger statement?
> >>
> >> Now that the Windows huge pages patch has landed, here is a rebase.  I
> >> took your alternative and tweaked it a tiny bit more.  Thoughts?
>
> Sorry, right, that was 100% wrong.  It would probably be correct to
> remove the "not", but let's just remove that bit.  New version
> attached.

+PostgreSQL. On Linux, this is called
+"transparent huge pages", but since that feature is known to cause
+performance degradation with
+PostgreSQL on current Linux versions
+(unlike explicit use of huge_pages), its use is
+discouraged.

Consider this shorter, less-severe sounding alternative:
"... (but note that this feature can degrade performance of some
PostgreSQL workloads)."

Justin



Re: Doc tweak for huge_pages?

2018-01-21 Thread Justin Pryzby
On Mon, Jan 22, 2018 at 03:54:26PM +1300, Thomas Munro wrote:
> On Fri, Jan 12, 2018 at 1:12 PM, Thomas Munro
>  wrote:
> > On Tue, Jan 9, 2018 at 6:24 AM, Catalin Iacob  
> > wrote:
> > I don't know enough about this to make such a strong recommendation
> > myself, which is why I was only trying to report that bad performance
> > had been observed on some version, not that you shouldn't do it.  Any
> > other views on this stronger statement?
> 
> Now that the Windows huge pages patch has landed, here is a rebase.  I
> took your alternative and tweaked it a tiny bit more.  Thoughts?

+   
+Note that, besides explicitly requesting huge pages via
+huge_pages,
=> I would just say:
"Note that, besides huge pages requested explicitly, ..."

+ In Linux this automatic use is
=> ON Linux comma?

+called "transparent huge pages" and is not enabled by default in
+popular distributions as of the time of writing, but since transparent

=> really ?  I don't know if I've ever seen it not enabled.  In any case,
that's a strong statement to make (to be disabled in ALL popular distributions).

I checked all our servers, including centos6 and ubuntu t-LTS and x-LTS.  On a
limited few where it was disabled, I'd explicitly done so.

On a server on which I just installed ubuntu-x LTS, with 4.13.0-26-generic:
pryzbyj@gta-ubuntu:~$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

https://github.com/torvalds/linux/commit/13ece886d99cd668483113f7238e419d5331af26
=> the compile time default is to disable, but (if enabled at compile time),
the runtime default is "always".

On centos7
Linux template0 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux
$ cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never
$ grep TRANS /boot/config-3.10.0-693.11.6.el7.x86_64 
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set

https://blog.nelhage.com/post/transparent-hugepages/
=> It is enabled (”enabled=always”) by default in most Linux distributions.

Justin



Re: [doc fix] Correct calculation of vm.nr_hugepages

2018-02-19 Thread Justin Pryzby
On Mon, Feb 19, 2018 at 07:05:47AM +, Tsunakawa, Takayuki wrote:
> The current PostgreSQL documentation overestimates the number of huge pages
> (vm.nr_hugepages) because the calculation uses the maximum virtual address
> space.  In practice, huge pages are only used for the anonymous shared memory
> segment.  The attached patch fixes the documentation.

+ pmap 4170 | grep rw-s | grep zero | awk '{print $2}'

One can do with fewer greps:
pryzbyj@pryzbyj:~$ sudo pmap `pgrep -P 1 -u postgres` |awk 
'/rw-s/&&/zero/{print $2}' # check PPID=1
144848K

or:
pryzbyj@pryzbyj:~$ sudo pmap `pgrep -u postgres |sed q` |awk 
'/rw-s/&&/zero/{print $2}' # check any backend, not just postmaster parent
144848K

Or (this is even less clean but gives an alternative which continues to use
/proc directly):
pryzbyj@pryzbyj:~$ sudo cat /proc/`pgrep -P 1 -u postgres`/maps |awk 
--non-decimal-data 'BEGIN{FS="[ -]"} /zero/{a="0x"$1;b="0x"$2;print 
(b-a)/1024"kB"}'
144848kB

BTW when huge_pages=try, I've been verifying hugepages are in use by running:
pryzbyj@pryzbyj:~$ sudo sh -c 'cat /proc/`pgrep -u postgres -P1`/smaps |grep -c 
"KernelPageSize: *2048 kB"' || echo NOT FOUND 
0
NOT FOUND

Justin



Re: [doc fix] Correct calculation of vm.nr_hugepages

2018-02-21 Thread Justin Pryzby
On Wed, Feb 21, 2018 at 03:14:57PM -0500, Robert Haas wrote:
> On Mon, Feb 19, 2018 at 9:43 PM, Tsunakawa, Takayuki
>  wrote:
> > Thanks, I'd like to take this.
> 
> Why are these values so large?  The example in the documentation shows
> 6490428 kB, and in my test I got 8733888 kB.  But 8733888 kB = 8.3 TB!
>  8.3 GB would make sense, but 8.3 TB does not.

pryzbyj@pryzbyj:~$ units -t -v 8733888kB GiB
8733888kB = 8.1340671 GiB



Re: adding tab completions

2018-07-29 Thread Justin Pryzby
Sorry for the delay..this got lost while catching up after being out of town..

On Thu, Jun 28, 2018 at 02:20:39PM +0300, Arthur Zakirov wrote:
> Thank you for the new version.
> 
> On Wed, Jun 27, 2018 at 03:10:51PM -0500, Justin Pryzby wrote:
> > Thanks - I've done this in the attached.  It works well for having minimal
> > logic.
> 
> I think everthing is OK. But I didn't notice in previous version of the
> patch that there is no braces in EXPLAIN and VACUUM conditions. I attached
> diff file to show it.
> 
> Also I included TEXT, XML, JSON, YAML items for FORMAT option in the
> diff file. The patch shows ",", ")", "ON", "OFF" for all options, but
> FORMAT requires format type to be specified. Please consider this change
> only as a suggestion.

Your suggestion is good, so attached updated patch.

> > Actually..another thought: since toast tables may be VACUUM-ed, should I
> > introduce Query_for_list_of_tpmt ?

I didn't include this one yet though.

Feel free to bump to next CF.

Justin
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 7bb47ea..1d235a3 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -705,6 +705,7 @@ static const SchemaQuery Query_for_list_of_tmf = {
"pg_catalog.pg_class c",
/* selcondition */
"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+   CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
CppAsString2(RELKIND_MATVIEW) ", "
CppAsString2(RELKIND_FOREIGN_TABLE) ")",
/* viscondition */
@@ -,6 +2223,7 @@ psql_completion(const char *text, int start, int end)
"fillfactor",
"parallel_workers",
"log_autovacuum_min_duration",
+   "toast_tuple_target",
"toast.autovacuum_enabled",
"toast.autovacuum_freeze_max_age",
"toast.autovacuum_freeze_min_age",
@@ -2703,7 +2705,7 @@ psql_completion(const char *text, int start, int end)
COMPLETE_WITH_LIST2("TABLE", "MATERIALIZED VIEW");
/* Complete PARTITION BY with RANGE ( or LIST ( or ... */
else if (TailMatches2("PARTITION", "BY"))
-   COMPLETE_WITH_LIST2("RANGE (", "LIST (");
+   COMPLETE_WITH_LIST3("RANGE (", "LIST (", "HASH (");
/* If we have xxx PARTITION OF, provide a list of partitioned tables */
else if (TailMatches2("PARTITION", "OF"))

COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables, "");
@@ -2996,14 +2998,27 @@ psql_completion(const char *text, int start, int end)
else if (Matches1("EXECUTE"))
COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements);
 
-/* EXPLAIN */
-
-   /*
-* Complete EXPLAIN [ANALYZE] [VERBOSE] with list of EXPLAIN-able 
commands
-*/
+/*
+ * EXPLAIN [ ( option [, ...] ) ] statement
+ * EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
+ */
else if (Matches1("EXPLAIN"))
-   COMPLETE_WITH_LIST7("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
-   "ANALYZE", "VERBOSE");
+   COMPLETE_WITH_LIST8("SELECT", "INSERT", "DELETE", "UPDATE", 
"DECLARE",
+   "ANALYZE", "VERBOSE", 
"(");
+   else if (HeadMatches2("EXPLAIN", "("))
+   {
+   if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+   COMPLETE_WITH_LIST7("ANALYZE", "VERBOSE", "COSTS", 
"BUFFERS", "TIMING", "SUMMARY", "FORMAT");
+   else if (TailMatches1("FORMAT"))
+   COMPLETE_WITH_LIST4("TEXT", "XML", "JSON", "YAML");
+   else if 
(TailMatches1("ANALYZE|VERBOSE|COSTS|BUFFERS|TIMING|SUMMARY"))
+   COMPLETE_WITH_LIST4(",", ")", "ON", "OFF");
+   else
+   COMPLETE_WITH_LIST2(",", ")");
+   }
+   else if (HeadMatches2("EXPLAIN", MatchAny) && ends_with(prev_wd, ')'))
+   /* If the parenthesis are balanced, the list is apparently 
parsed as a single word */
+   COMPLETE_WITH_LIST5("SELECT", "INS

Re: Fwd: Would like to help with documentation for Postgres 11

2018-07-29 Thread Justin Pryzby
On Sun, Jul 29, 2018 at 11:50:40AM -0500, Michael Goldshteyn wrote:
> I would like to offer some help writing and improving the English
> documentation for some of the new features and changes in Postgres 11. If I
> can get an email of where such help would be appreciated, so I can choose a
> feature I am familiar with, I would be glad to help.

The documentation is expected to be commited with the feature, so what's
currently in place is expected to be adequate and accuate.

You could review any documentation changes since v10.  Any issues or
improvements you can discuss or send patch to -hackers.

Maybe something like:
git log -p REL_10_4..REL_11_BETA2 doc/
..or just git diff which would not show commits separately but also would not
show multiple updates for same feature (like: commit: feature... commit:
improve documentation... commit: fix typos... commit: change default)

Alternately you can look at the pg11 release notes (or git log?) for list of
features and check that corresponding documentation exists and is accurate.
That could conceivably expose implementation bugs, eg. with edge cases.
git log REL_10_4..REL_11_BETA2

Justin



10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes

2018-08-29 Thread Justin Pryzby
I've seen this message now a handful of times recently.  It seems to happen
overnight, during a maintenance job which reindex things, including system
catalog indices.

It's easy to reproduce error under 10.5, but not under 10.3 nor 10.4.

while :; do for a in pg_class_oid_index pg_class_relname_nsp_index 
pg_class_tblspc_relfilenode_index; do psql ts -qc "REINDEX INDEX $a"; done; 
done&

[pryzbyj@database ~]$ while :; do psql ts -qc ''; done
psql: FATAL:  could not read block 1 in file "base/16400/313430687": read only 
0 of 8192 bytes
psql: FATAL:  could not read block 0 in file "base/16400/313430708": read only 
0 of 8192 bytes
psql: FATAL:  could not read block 0 in file "base/16400/313430711": read only 
0 of 8192 bytes
^C

postgres=# SELECT * FROM postgres_log WHERE error_severity ='FATAL';
log_time   | 2018-08-28 22:19:58.822-05
user_name  | telsasoft
database   | ts
pid| 22445
connection_from| 192.168.122.12:58318
session_id | 5b8610de.57ad
session_line   | 1
command_tag| startup
session_start_time | 2018-08-28 22:19:58-05
virtual_transaction_id | 26/280967
transaction_id | 0
error_severity | FATAL
sql_state_code | XX001
message| could not read block 0 in file "base/16400/313316300": 
read only 0 of 8192 bytes

Justin



Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes

2018-08-30 Thread Justin Pryzby
On Thu, Aug 30, 2018 at 05:30:30PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > On Wed, Aug 29, 2018 at 11:35:51AM -0400, Tom Lane wrote:
> >> As far as we can tell, that bug is a dozen years old, so it's not clear
> >> why you find that you can reproduce it only in 10.5.  But there might be
> >> some subtle timing change accounting for that.
> 
> > It seems to me there's one root problem occurring in (at least) two slightly
> > different ways.  The issue/symptom that I've been seeing occurs in 10.5 but 
> > not
> > 10.4, and specifically at commit 2ce64ca, but not before. 
> 
> Yeah, as you probably saw in the other thread, we later realized that
> 2ce64ca created an additional pathway for ScanPgRelation to recurse;
> a pathway that's evidently easier to hit than the pre-existing ones.
> I note that both of your stack traces display ScanPgRelation recursion,
> so I'm feeling pretty confident that what you're seeing is the same
> thing.
> 
> But, as Andres says, it'd be great if you could confirm whether the
> draft patches fix it for you.

I tested with relcache-rebuild.diff which hasn't broken in 15min, so I'm
confident that doesn't hit the additional recusive pathway, but have to wait
awhile and see if autovacuum survives, too.

I tried to apply fix-missed-inval-msg-accepts-1.patch on top of PG10.5 but
patch didn't apply, so I can test HEAD after the first patch soaks awhile.

Just curious, is there really any difficulty in reproducing this?  Once I
realized this was a continuing issue and started to suspect pg10.5, it takes
just about nothing to reproduce anywhere I've tried.  I just tested 5 servers,
and only one took more than a handful of seconds to fail.  I gave up waiting
for a 6th server, because I found it was waiting on a pre-existing lock.

[pryzbyj@database ~]$ while :; do for a in pg_class_oid_index 
pg_class_relname_nsp_index pg_class_tblspc_relfilenode_index; do psql ts -qc 
"REINDEX INDEX $a"; done; done&
[pryzbyj@database ~]$ a=0; time while psql ts -qc ''; do a=$((1+a)); done ; 
echo "$a"
psql: FATAL:  could not read block 0 in file "base/16400/313581263": read only 
0 of 8192 bytes

real0m1.772s
user0m0.076s
sys 0m0.116s
47

Justin



Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes

2018-08-30 Thread Justin Pryzby
On Wed, Aug 29, 2018 at 11:35:51AM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > I've seen this message now a handful of times recently.  It seems to happen
> > overnight, during a maintenance job which reindex things, including system
> > catalog indices.
> > It's easy to reproduce error under 10.5, but not under 10.3 nor 10.4.
> > while :; do for a in pg_class_oid_index pg_class_relname_nsp_index 
> > pg_class_tblspc_relfilenode_index; do psql ts -qc "REINDEX INDEX $a"; done; 
> > done&
> > time while psql ts -qc ''; do :; done
> 
> This looks suspiciously like the issue under discussion in
> 
> https://www.postgresql.org/message-id/12259.1532117714%40sss.pgh.pa.us
> 
> As far as we can tell, that bug is a dozen years old, so it's not clear
> why you find that you can reproduce it only in 10.5.  But there might be
> some subtle timing change accounting for that.

I confirmed the stack appears to be same as here, modulo differences between
9.6 to 10.
https://www.postgresql.org/message-id/fcab55ec-660a-bc43-7fe1-09af3665e89a%402ndquadrant.com

It seems to me there's one root problem occurring in (at least) two slightly
different ways.  The issue/symptom that I've been seeing occurs in 10.5 but not
10.4, and specifically at commit 2ce64ca, but not before. 

|commit 2ce64caaf793615d0f7a7ce4380e7255077d130c
|Author: Andres Freund 
|Date:   Tue Jun 12 11:13:21 2018 -0700
|
|Fix bugs in vacuum of shared rels, by keeping their relcache entries current.

However, the same message CAN happen under 10.4, though less easily, due to
autovacuum but (afaict) not user queries.  Maybe that's a coincidence, but my
experience mirrors the build logs: the problem since 10.5 is easily tickled in
a few seconds rather than an hour.

The issue I see frequently under 10.5 has this stack:

Core was generated by `postgres: pryzbyj postgres [local] startup   
 '.
#2  0x00709425 in mdread ()
#3  0x006e2bc8 in ReadBuffer_common ()
#4  0x006e3511 in ReadBufferExtended ()
#5  0x004c2da9 in _bt_getbuf ()
#6  0x004c32b2 in _bt_getroot ()
#7  0x004c799b in _bt_search ()
#8  0x004c8568 in _bt_first ()
#9  0x004c52c7 in btgettuple ()
#10 0x004bf936 in index_getnext_tid ()
#11 0x004bfb43 in index_getnext ()
#12 0x004beeae in systable_getnext ()
#13 0x007fa074 in ScanPgRelation ()
#14 0x007fe371 in RelationClearRelation ()
#15 0x007fe84b in RelationCacheInvalidateEntry ()
#16 0x007f7e15 in LocalExecuteInvalidationMessage ()
#17 0x006f3b38 in ReceiveSharedInvalidMessages ()
#18 0x006f6eb5 in LockRelationOid ()
#19 0x004ae495 in relation_open ()
#20 0x004bf336 in index_open ()
#21 0x004bee3c in systable_beginscan ()
#22 0x007fa069 in ScanPgRelation ()
#23 0x007fce84 in RelationBuildDesc ()
#24 0x007fe6f8 in RelationIdGetRelation ()
#25 0x004ae463 in relation_open ()
#26 0x004ae6a6 in heap_open ()
#27 0x00819237 in InitPostgres ()
#28 0x0070e2e9 in PostgresMain ()
#29 0x00477f36 in ServerLoop ()
#30 0x006a5e3e in PostmasterMain ()
#31 0x00478a08 in main ()

>From autovacum running REL_10_4 I've now seen (since trying to tickle the
problem yesterday):

#1  0x7fbfb5b9d01a in __GI_abort () at abort.c:89
#2  0x00708489 in mdread ()
#3  0x006e1c48 in ReadBuffer_common ()
#4  0x006e2591 in ReadBufferExtended ()
#5  0x004c2369 in _bt_getbuf ()
#6  0x004c2810 in _bt_getroot ()
#7  0x004c6f5b in _bt_search ()
#8  0x004c7b28 in _bt_first ()
#9  0x004c4887 in btgettuple ()
#10 0x004bf00e in index_getnext_tid ()
#11 0x004bf1f3 in index_getnext ()
#12 0x004be679 in systable_getnext ()
#13 0x007f9084 in ScanPgRelation ()
#14 0x007fbe54 in RelationBuildDesc ()
#15 0x007fcc86 in RelationClearRelation ()
#16 0x007fd7bb in RelationCacheInvalidateEntry ()
#17 0x007f6e25 in LocalExecuteInvalidationMessage ()
#18 0x006f2bb8 in ReceiveSharedInvalidMessages ()
#19 0x006f5f35 in LockRelationOid ()
#20 0x004ade05 in relation_open ()
#21 0x004bea96 in index_open ()
#22 0x004be606 in systable_beginscan ()
#23 0x007f9079 in ScanPgRelation ()
#24 0x007fbe54 in RelationBuildDesc ()
#25 0x007fd668 in RelationIdGetRelation ()
#26 0x004addd3 in relation_open ()
#27 0x004bea96 in index_open ()
#28 0x004be606 in systable_beginscan ()
#29 0x007fef60 in RelationGetIndexList ()
#30 0x005dba6c in vac_open_indexes ()
#31 0x005dc652 in lazy_vacuum_rel ()
#32 0x005da7b4 in vacuum_rel ()
#33 0x005db661 in vacuum ()
#34 0x004770db in do_autovacuum ()
#35 0x

Re: cache lookup failed for constraint when alter table referred by partition table

2018-09-10 Thread Justin Pryzby
Adding Alvaro 

On Fri, Sep 07, 2018 at 04:02:13PM +0530, Rajkumar Raghuwanshi wrote:
> postgres=# CREATE TABLE non_part (a INT,PRIMARY KEY(a));
> postgres=# CREATE TABLE part (a INT REFERENCES non_part(a)) PARTITION BY 
> RANGE(a);
> postgres=# CREATE TABLE part_p1 PARTITION OF part FOR VALUES FROM (MINVALUE) 
> TO (MAXVALUE);
> postgres=# ALTER TABLE non_part ALTER COLUMN a TYPE bigint;
> *ERROR:  cache lookup failed for constraint 16398*

I want to suggest adding to open items.
https://wiki.postgresql.org/index.php?title=PostgreSQL_11_Open_Items

..since it's documented as an "Major enhancement" in PG11:
https://www.postgresql.org/docs/11/static/release-11.html

Justin



Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes

2018-08-31 Thread Justin Pryzby
On Thu, Aug 30, 2018 at 01:25:00PM -0700, Andres Freund wrote:
> On August 30, 2018 1:24:12 PM PDT, Justin Pryzby  wrote:
> >On Thu, Aug 30, 2018 at 01:18:59PM -0700, Andres Freund wrote:
> >> Could you check if both of the proposed attempts at fixing the issue
> >> also solves your problem?
> >
> >Just checking that you're referring to:
> >
> >1)  relcache-rebuild.diff
> >https://www.postgresql.org/message-id/20180829083730.n645apqhb2gyih3g%40alap3.anarazel.de
> >2)  fix-missed-inval-msg-accepts-1.patch
> >https://www.postgresql.org/message-id/25024.1535579899%40sss.pgh.pa.us
> 
> Right. Either should fix the issue.

Confirmed that relcache-rebuild.diff seems to fix/avoid the issue under 10.5,
and fix-missed-inval-msg-accepts-1.patch does so for HEAD (but doesn't apply
cleanly to 10.5 so not tested).  That's for both the the new easy-to-hit
symptom and the pre-existing hard-to-hit issue affecting pg10.4 and earlier.

Thanks,
Justin



remove duplicated words in comments .. across lines

2018-09-07 Thread Justin Pryzby
Resending to -hackers as I realized this isn't a documentation issue so not
appropriate or apparently interesting to readers of -doc.

Inspired by David's patch [0], find attached fixing words duplicated, across
line boundaries.

I should probably just call the algorithm proprietary, but if you really wanted 
to know, I've suffered again through sed's black/slashes.

time find . -name '*.c' -o -name '*.h' |xargs sed -srn '/\/\*/!d; :l; 
/\*\//!{N; b l}; s/\n[[:space:]]*\*/\n/g; 
/(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d; s//>>&<\n[[:space:]]*\<\1\>/!d; s//>>&parent. The root page is never released, to
- * to prevent conflict with vacuum process.
+ * prevent conflict with vacuum process.
  */
 static void
 ginFindParents(GinBtree btree, GinBtreeStack *stack)
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 4aff6cf7f2..3f778093cb 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1737,7 +1737,7 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len)
  * possible that GXACTs that were valid at checkpoint start will no longer
  * exist if we wait a little bit. With typical checkpoint settings this
  * will be about 3 minutes for an online checkpoint, so as a result we
- * we expect that there will be no GXACTs that need to be copied to disk.
+ * expect that there will be no GXACTs that need to be copied to disk.
  *
  * If a GXACT remains valid across multiple checkpoints, it will already
  * be on disk so we don't bother to repeat that write.
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 5c6de4989c..4a039b1190 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -422,7 +422,7 @@ ExecuteRecoveryCommand(const char *command, const char *commandName, bool failOn
 /*
  * A file was restored from the archive under a temporary filename (path),
  * and now we want to keep it. Rename it under the permanent filename in
- * in pg_wal (xlogfname), replacing any existing file with the same name.
+ * pg_wal (xlogfname), replacing any existing file with the same name.
  */
 void
 KeepFileRestoredFromArchive(const char *path, const char *xlogfname)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3e148f03d0..1262594058 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -2898,7 +2898,7 @@ analyze_mcv_list(int *mcv_counts,
 	 * significantly more common than the estimated selectivity they would
 	 * have if they weren't in the list.  All non-MCV values are assumed to be
 	 * equally common, after taking into account the frequencies of all the
-	 * the values in the MCV list and the number of nulls (c.f. eqsel()).
+	 * values in the MCV list and the number of nulls (c.f. eqsel()).
 	 *
 	 * Here sumcount tracks the total count of all but the last (least common)
 	 * value in the MCV list, allowing us to determine the effect of excluding
diff --git a/src/backend/commands/seclabel.c b/src/backend/commands/seclabel.c
index 5ee46905d8..1ac7756f2a 100644
--- a/src/backend/commands/seclabel.c
+++ b/src/backend/commands/seclabel.c
@@ -321,7 +321,7 @@ SetSharedSecurityLabel(const ObjectAddress *object,
 /*
  * SetSecurityLabel attempts to set the security label for the specified
  * provider on the specified object to the given value.  NULL means that any
- * any existing label should be deleted.
+ * existing label should be deleted.
  */
 void
 SetSecurityLabel(const ObjectAddress *object,
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1b659a5870..65f2ba85fe 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -956,7 +956,7 @@ info_cb(const SSL *ssl, int type, int args)
  * precomputed.
  *
  * Since few sites will bother to create a parameter file, we also
- * also provide a fallback to the parameters provided by the
+ * provide a fallback to the parameters provided by the
  * OpenSSL project.
  *
  * These values can be static (once loaded or computed) since the
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 0dd55ac1ba..d241d9b3f9 100644
--- a/src/backend/partitioning/partprune.c
+++ 

Re: Make deparsing of column defaults faster

2018-07-05 Thread Justin Pryzby
On Mon, Jun 04, 2018 at 10:00:53PM -0400, Peter Eisentraut wrote:
> On 6/4/18 20:55, Jeff Janes wrote:
> > Since defaults can't contain Vars, this patch converts the second
> > parameter to zero in places where pg_get_expr is invoked on column
> > defaults.
> 
> My in-progress generated columns patch removes that assumption (since a
> generated column generally refers to another column of the same table).

On Thu, Jul 05, 2018 at 04:45:07PM +0200, Peter Eisentraut wrote:
> On 29.06.18 05:15, Jeff Janes wrote:
> > Since pg_dump calls pg_get_expr once over and over again on the same
> > table consecutively, perhaps we could cache the column alias assignments
> > in a single-entry cache, so if it is called on the same table as last
> > time it just re-uses the aliases from last time.  I am not planning on
> 
> I looked into that.  deparse_context_for() is actually not that
> expensive on its own, well below one second, but it gets somewhat
> expensive when you call it 1600 times for one table.  So to address that
> case, we can cache the deparse context between calls in the fn_extra
> field of pg_get_expr.  The attached patch does that.  This makes the
> pg_dump -s times pretty much constant even with 1600 columns with
> defaults.

I checked on one customer running PG10.4, for which pg_dump takes 8 minutes to
pg_dump -s.

I imported existing schema to PG12dev (which itself took 25min) and compared:
patched: 2m33.616s
unpatched: 7m19.578s

Note that I've reduced the number of child tables in this DB recently (by
repartitioning tables from daily to monthly granularity), thereby reducing the
number of columns of the largest tables by a factor of 30, and reducing the
size of pg_dump -s to 51MB from 120MB (6 months ago).  I expect this patch
would've saved even more before the cleanup.

> How realistic is this use case?  Is it worth it?

Note, that affects pg_upgrade, which is how this issue originally came up [0].
(But I believe pg_upgrade likes to call pg_dump from the old server version, so
pg_upgrade to v11 couldn't benefit unless this was included in PG10.5).

[pryzbyj@database postgresql]$ grep -c 'SET DEFAULT' 
/srv/cdrperfbackup/ts/2018-07-04/pg_dump-section=pre-data
183915

Justin

[0] 
https://www.postgresql.org/message-id/CAMkU%3D1x-e%2BmaqefhM1yMeSiJ8J9Z%2BSJHgW7c9bqo3E3JMG4iJA%40mail.gmail.com



Re: How to make partitioning scale better for larger numbers of partitions

2018-07-13 Thread Justin Pryzby
On Fri, Jul 13, 2018 at 05:49:20AM +, Tsunakawa, Takayuki wrote:
> David has submitted multiple patches for PG 12, one of which speeds up 
> pruning of UPDATE/DELETE (I couldn't find it in the current CF, though.)  
> What challenges are there for future versions, and which of them are being 
> addressed by patches in progress for PG 12, and which issues are untouched?

Here's an known issue which I'm not sure is on anybody's list:
https://www.postgresql.org/message-id/flat/4F989CD8.8020804%40strategicdata.com.au#4f989cd8.8020...@strategicdata.com.au
=> planning of UPDATE/DELETE (but not SELECT) takes huge amount of RAM when run
on parent with many childs.

We don't typically have UPDATEs, and those we do are against the child tables;
but ran into this last month with a manual query on parent causing OOM.  I
worked around it, but keep meaning to revisit to see if this changed at all in
v11 (very brief testing suggests not changed).

Justin



Re: Doc tweak for huge_pages?

2018-01-23 Thread Justin Pryzby
On Wed, Jan 24, 2018 at 07:46:41AM +0100, Catalin Iacob wrote:
> I see Peter assigned himself as committer, some more information below
> for him to decide on the strength of the anti THP message.
Thanks for digging this up!

> And it would be good if somebody could run benchmarks on pre 4.6 and
> post 4.6 kernels. I would love to but have no access to big (or
> medium) hardware.
I should be able to do this, since I have a handful of kernels upgrades on my
todo list.  Can you recommend a test ?  Otherwise I'll come up with something
for pgbench.

But I think any test should be independant of and not influence the doc change
(I don't know anywhere else in the docs which talks about behaviors of specific
kernel versions, which often have vendor patches backpatched anyway).

> So maybe we should weaken the language against THP. Maybe present the
> known facts so far, even if the post 4.6 situation is vague/unknown:
> before Linux 4.6 there were repeated reports of THP problems with
> Postgres, Linux >= 4.6 might improve things but this isn't confirmed.
> And it would be good if somebody could run benchmarks on pre 4.6 and
> post 4.6 kernels. I would love to but have no access to big (or
> medium) hardware.
I think all the details should go elsewhere in the docs; config.sgml already
references this:
https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-HUGE-PAGES
..but it doesn't currently mention "transparent" hugepages.

Justin



Re: Bug report: Dramatic increase in conflict with recovery after upgrading 10.2->10.5

2018-09-11 Thread Justin Pryzby
On Mon, Sep 10, 2018 at 11:43:47AM +0200, Chris Travers wrote:
> At present I believe this to likely be a regression.  But if nobody else
> knows otherwise, I should know more in a couple days.

Do you have query logs or can you send details of the query ?

We're not using replication, but I can't help but think of this:
https://www.postgresql.org/message-id/flat/20180829140149.GO23024%40telsasoft.com
..since it's effectively a regression WRT reliability (at least if you reindex
pg_class).  Tom has a patch in HEAD to avoid the issue, but I don't know if
there's any plan to release 10.6 until november.

See related, earlier thread:
https://www.postgresql.org/message-id/flat/12259.1532117714%40sss.pgh.pa.us

You could compile your own binaries with Tom's patch applied (f868a81).
As you probably know, it's maybe not safe to install PG10.4 binaries on a data
dir where you've already upgraded to 10.5 (I believe because data file content
might be written which may not be handled correctly by earlier minor release).

Justin



Re: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary

2018-03-11 Thread Justin Pryzby
On Sun, Mar 11, 2018 at 11:04:01PM +0900, Michael Paquier wrote:
> On Fri, Mar 09, 2018 at 08:22:49AM +, Tsunakawa, Takayuki wrote:
> > Thanks for reviewing.  All done.
> 
> Thanks.  Please be careful with the indentation of the patch.  Attached
> is a slightly-updated version with a small modification in
> remove_target_file to make the code cleaner, a proposal of commit
> message and pgindent applied on the code.  I am switching that as ready
> for committer.
> --
> Michael

> From 62950c615d87e3e58998f295ce446eb5c80707e4 Mon Sep 17 00:00:00 2001
> From: Michael Paquier 
> Date: Sun, 11 Mar 2018 22:49:55 +0900
> Subject: [PATCH] Fix handling of removed files on target server with pg_rewind
> 
> After processing the filemap to build the list of chunks that will be
> fetched from the source to rewing the target server, it is possible that
> a file which was previously processed is removed from the source.  A
> simple example of such an occurence is a WAL segment which gets recycled
> on the target in-between.  When the filemap is processed, files not
> categorized as relation files are first truncated to prepare for its
> full copy of which is going to be taken from the source, divided into a
> set of junks.

Hopefully a set of CHUNKS not JUNKS ?

Justin



Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby
On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > I can see why you'd want that, but as a DBA, I don't necessarily want
> > all of that recorded, especially in a quasi-permanent way.
> 
> Huh?  You're arguing that we should make it easier for DBAs to hide
> potential causes of corruption? I fail to see when that's necessary /
> desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> nor does it contain private data...

The more fine grained these are the more useful they can be:

Running with fsync=off is common advice while loading, so reporting that
"fsync=off at some point" is much less useful than reporting "having to run
recovery from an fsync=off database".

I propose there could be two bits:

 1. fsync was off at some point in history when the DB needed recovery;
 2. fsync was off during the current instance; I assume the bit would be set
whenever fsync=off was set, but could be cleared when the server was
cleanly shut down.  If the bit is set during recovery, the first bit would
be permanently set.  Not sure but if this bit is set and then fsync
re-enabled, maybe this could be cleared after any checkpoint and not just
shutdown ?

Justin



Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby
On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

What about:

 - pg_control versions used on this cluster (hopefully a full list..obviously
   not going back before PG11);
 - did recovery (you could use "needed recovery" instead, but then there's the
   question of how reliable that field would be);
   + or: timestamp of most recent recovery (attempt?)
 - index corruption detected (on system table?);  Note that "CREATE IF NOT
   EXIST" doesn't avoid unique key errors on system tables, so it's not useful
   to log every ERROR on system tables.
 - page %u is uninitialized --- fixing
 - here's one I just dug up: ERROR: missing chunk number 0 for toast value 
10270298 in pg_toast_2619 
 - XID wraparound?
 - autovacuum disabled?
 - checksum failue (in system relation?)
 - local_preload_libraries?
 - started in single user mode or with system indices disabled?
 - hit assertion or PANIC ??
 - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
   vacuum/analyze/cluster/reindex?)

Justin



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-09 Thread Justin Pryzby
On Mon, Apr 09, 2018 at 09:31:56AM +0800, Craig Ringer wrote:
> You could make the argument that it's OK to forget if the entire file
> system goes away. But actually, why is that ok?

I was going to say that it'd be okay to clear error flag on umount, since any
opened files would prevent unmounting; but, then I realized we need to consider
the case of close()ing all FDs then opening them later..in another process.

I was going to say that's fine for postgres, since it chdir()s into its
basedir, but actually not fine for nondefault tablespaces..

On Mon, Apr 09, 2018 at 02:54:16PM +0200, Anthony Iliopoulos wrote:
> notification descriptor open, where the kernel would inject events
> related to writeback failures of files under watch (potentially
> enriched to contain info regarding the exact failed pages and
> the file offset they map to).

For postgres that'd require backend processes to open() an file such that,
following its close(), any writeback errors are "signalled" to the checkpointer
process...

Justin



Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-04-23 Thread Justin Pryzby
Just want to add for the archive that I happened to run across what appears to
be a 7-year old report of (I think) both of these vacuum/analyze bugs:

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. 

   
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

   

https://www.postgresql.org/message-id/flat/6373488.29.1323717222303.JavaMail.root%40store1.zcs.ext.wpsrv.net#6373488.29.1323717222303.javamail.r...@store1.zcs.ext.wpsrv.net



Re: Should we add GUCs to allow partition pruning to be disabled?

2018-04-17 Thread Justin Pryzby
On Wed, Apr 18, 2018 at 12:07:18PM +1200, David Rowley wrote:
> In PG10 the planner's partition pruning could be disabled by changing
> the constraint_exclusion GUC to off.  This is still the case for PG11,
> but only for UPDATE and DELETE queries. There is currently no way to
> disable partition pruning for SELECT.
> 
> Should we allow this?
> 
> To make this a bit more complex, we now also have run-time pruning
> which can allow further partition pruning to be performed during
> execution.  I imagine if we're going to add a GUC for plan-time
> pruning then we should also have one for run-time pruning. These could
> also perhaps share the same GUC, so it seems there are some sub
> choices to make here:
> 
> 1. Add a single enable_ GUC which allows both plan-time and run-time
> pruning to be disabled.
> 2. Add two new enable_ GUCs, one for plan-time and one for run-time pruning.
> 3. No new GUCs / Do nothing.

Maybe this is divergent from the details of the implementation; but, from a
user's perspective: why not continue to use constraint_exclusion?

I would suggest to add zero new GUCs:

0. constraint_exclusion={off,partition,on,PLANNER*,EXECUTOR*}

I tentatively assume that "constraint_exclusion=partition" would disable PG11
"pruning", and that the new default setting would be "executor".

* Caveat: there may be a better name than planner/executor..
planner_prune?  execute_filter?

Justin



Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-03-28 Thread Justin Pryzby
On Thu, Mar 29, 2018 at 11:30:59AM +0900, Michael Paquier wrote:
> On Tue, Mar 27, 2018 at 11:53:08PM -0400, Tom Lane wrote:
> > Craig Ringer  writes:
> >> TL;DR: Pg should PANIC on fsync() EIO return.
> > 
> > Surely you jest.
> 
> Any callers of pg_fsync in the backend code are careful enough to check
> the returned status, sometimes doing retries like in mdsync, so what is
> proposed here would be a regression.

The retries are the source of the problem ; the first fsync() can return EIO,
and also *clears the error* causing a 2nd fsync (of the same data) to return
success.

(Note, I can see that it might be useful to PANIC on EIO but retry for ENOSPC).

On Thu, Mar 29, 2018 at 03:48:27PM +1300, Thomas Munro wrote:
> Craig, is the phenomenon you described the same as the second issue
> "Reporting writeback errors" discussed in this article?
> https://lwn.net/Articles/724307/

Worse, the article acknowledges the behavior without apparently suggesting to
change it:

 "Storing that value in the file structure has an important benefit: it makes
it possible to report a writeback error EXACTLY ONCE TO EVERY PROCESS THAT
CALLS FSYNC()  In current kernels, ONLY THE FIRST CALLER AFTER AN ERROR
OCCURS HAS A CHANCE OF SEEING THAT ERROR INFORMATION."

I believe I reproduced the problem behavior using dmsetup "error" target, see
attached.

strace looks like this:

kernel is Linux 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:19:48 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

 1  open("/dev/mapper/eio", O_RDWR|O_CREAT, 0600) = 3
 2  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
8192
 3  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
8192
 4  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
8192
 5  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
8192
 6  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
8192
 7  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
8192
 8  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
2560
 9  write(3, 
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 
-1 ENOSPC (No space left on device)
10  dup(2)  = 4
11  fcntl(4, F_GETFL)   = 0x8402 (flags 
O_RDWR|O_APPEND|O_LARGEFILE)
12  brk(NULL)   = 0x1299000
13  brk(0x12ba000)  = 0x12ba000
14  fstat(4, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
15  write(4, "write(1): No space left on devic"..., 34write(1): No space 
left on device
16  ) = 34
17  close(4)= 0
18  fsync(3)= -1 EIO (Input/output error)
19  dup(2)  = 4
20  fcntl(4, F_GETFL)   = 0x8402 (flags 
O_RDWR|O_APPEND|O_LARGEFILE)
21  fstat(4, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
22  write(4, "fsync(1): Input/output error\n", 29fsync(1): Input/output 
error
23  ) = 29
24  close(4)= 0
25  close(3)= 0
26  open("/dev/mapper/eio", O_RDWR|O_CREAT, 0600) = 3
27  fsync(3)= 0
28  write(3, "\0", 1)   = 1
29  fsync(3)= 0
30  exit_group(0)   = ?

2: EIO isn't seen initially due to writeback page cache;
9: ENOSPC due to small device
18: original IO error reported by fsync, good
25: the original FD is closed
26: ..and file reopened
27: fsync on file with still-dirty data+EIO returns success BAD

10, 19: I'm not sure why there's dup(2), I guess glibc thinks that perror
should write to a separate FD (?)

Also note, close() ALSO returned success..which you might think exonerates the
2nd fsync(), but I think may itself be problematic, no?  In any case, the 2nd
byte certainly never got written to DM error, and the failure status was lost
following fsync().

I get the exact same behavior if I break after one write() loop, such as to
avoid ENOSPC.

Justin
/*
% make CFLAGS='-Wall -Wextra -O3' /tmp/eio
% sudo lvcreate -L 9M -n tst data

echo '
0 1 linear /dev/data/tst 0
1 1 error 1
2 99 linear /dev/data/tst 2' |sudo dmsetup create eio

*/

#include 

#include 
#include 
#include 

#include 
#include 

int main()
{
	char buf[8<<10];
	int fd;

	if (-1==(fd=open("/dev/mapper/eio", O_CREAT|O_RDWR, 00600))) {
	//if (-1==(fd=open("/mnt/t", O_CREAT|O_RDWR, 00600))) {
		perror("open");
		exit(1);
	}

	while (1) {
		// if (sizeof(buf)!=(write(fd, buf, sizeof(buf {
		if (-1==write(fd, buf, sizeof(buf))) {
			perror("write(1)");
			

Re: Index scan prefetch?

2018-03-26 Thread Justin Pryzby
On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:
> Hi, hackers.
> 
> I was faced with the following bad performance use case with Postgres: there
> is a huge append-only table with serial key (ID)
> which is permanently appended using multithreaded pgloader.

I think this could be similar to the problem I reported here:
https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com

The analysis at the time was that, due to "repeated keys", a btree index on the
timestamp column had non-consecutive heap TIDs (btree insertion uses random
early insertion point to avoid superlinear lookup cost during such insertions).

But, our case also involved multiple child processes simultaneously inserting
into same table, and I wonder if "repeated keys" were more or less unrelated to
the problem.  The behavior is maybe caused easily by simultaneous insertions
"clustered" around the same target: for us: now(), for you: nextval().

> But now effective_io_concurrency parameter is applicable only for bitmap
...
> Will it be useful to support it also for index scan?
> Or there are some other ways to address this problem?

Does your case perform well with bitmap heap scan (I mean bitmap scan of the
single index)?  It seems to me that prefetch wouldn't help, as it would just
incur the same random cost you're already seeing; the solution may be to choose
another plan(bitmap) with sequential access to enable read-ahead,

Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
its own readahead, negatively affecting some queries:
https://www.postgresql.org/message-id/flat/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32%40gmail.com#20180207054227.ge17...@telsasoft.com

What's the pg_stats "correlation" for the table column with index being
scanned?  How many tuples?  Would you send explain(analyze,buffers) for the
problem query, and with SET enable_bitmapscan=off; ?

Justin



open/lseek overhead with many partitions (including with "faster partitioning pruning")

2018-04-03 Thread Justin Pryzby
TLDR: even with "faster pruning" patch, pg11dev open()+lseek() every file
backing every table being queried, even for those partitions eventually
"excluded".

One of our customers (with the largest number of child tables, currently of
relkind='r') takes a long time to plan query, and I wondered whether we'd
benefit from ongoing work, if I should plan to convert to relkind='p', and if I
could start testing that for our use cases.

I found by stracing that the backend is open()ing + lseek(SEEK_END) each file
in each relation being UNIONed in this view,  including relations which were
excluded by constraints.  Their tables are partitioned by day (some are still
40+GB), so there's only 10-20 tables being included in the above query, total
size perhaps ~50GB and under 50 backing files, but accessing a LOT more files
than needed:

cold backend:
[pryzbyj@database ~]$ grep open /tmp/strace-pg12-0|cut -d/ -f1,2 |sort |uniq -c 
|sort -nr |head
  26593 open("pg_tblspc/16400
   6894 open("pg_tblspc/16401
482 open("base/16402

warm backend:
[pryzbyj@database ~]$ grep open /tmp/strace-pg12-1|cut -d/ -f1,2 |sort |uniq -c 
|sort -nr |head
   6545 open("pg_tblspc/16401
   1338 open("pg_tblspc/16400
254 open("base/16402

I was curious if that was improved in pg11dev, and if it applied to relkind='r'
or only to relkind='p'.  I applied v47 patches for "path toward faster
partition pruning", and was able to quickly create synthetic "huge table" using
truncate (and verified that smgr logic actually opens all 1e5 files).

$ psql -d postgres -h /tmp --port 5678 -c "CREATE TABLE t(i int) PARTITION BY 
range(i)" 
$ for a in `seq 1 999`; do psql -d postgres -h /tmp --port 5678 -c "CREATE 
TABLE t$a PARTITION OF t FOR VALUES FROM ($a)TO($((1+a)))"; done
$ time for a in ./test11dev/base/13236/1???[0-9]; do echo "$a..."; truncate -s 
1G $a; done # make existing, empty files 1GB..
$ time for a in ./test11dev/base/13236/1???[0-9]; do echo "$a..."; for b in 
`seq 1 99`; do truncate -s 1G $a.$b; done; done # make new 1GB tables with .n 
extension

postgres=# explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1;
 Limit  (cost=0.00..0.01 rows=1 width=8)
   ->  Aggregate  (cost=0.00..0.01 rows=1 width=8)
 ->  Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
Time: 63268.474 ms (01:03.268)

=> zfs/fuse is certainly exacerbating the issue here due to context switches; 
but..

[pryzbyj@database postgresql]$ time cut -d\( -f1 
/tmp/strace-pg11dev-faster-15|sort |uniq -c |sort -nr |head
 100899 open
  99900 lseek
  98918 close
118 brk

postgres is still open()+lseek() on 100k files, most of which are for relations
which were clearly excluded:

(gdb) bt
#0  0x0032b2adb580 in __open_nocancel () from /lib64/libc.so.6
#1  0x00711827 in BasicOpenFilePerm (fileName=0x1e2c9d0 
"base/13236/16759.10", fileFlags=2, fileMode=384) at fd.c:965
#2  0x007121ea in PathNameOpenFilePerm (fileName=0x1e2c9d0 
"base/13236/16759.10", fileFlags=2, fileMode=384) at fd.c:1394
#3  0x00733f6e in _mdfd_openseg (reln=0x1e14d98, forknum=MAIN_FORKNUM, 
segno=10, oflags=0) at md.c:1783
#4  0x00734026 in mdnblocks (reln=0x1e14d98, forknum=MAIN_FORKNUM) at 
md.c:918
#5  0x006b26bc in estimate_rel_size (rel=0x7f3f707f4b98, 
attr_widths=0x1e2cd24, pages=, tuples=0x1e2cb98, 
allvisfrac=) at plancat.c:946
#6  0x006b3a47 in get_relation_info (root=0x1bd9030, 
relationObjectId=16759, inhparent=false, rel=0x1e2cae0) at plancat.c:144
#7  0x006b78fa in build_simple_rel (root=0x1bd9030, relid=126, 
parent=0x1bda578) at relnode.c:185
#8  0x006b7990 in build_simple_rel (root=0x1bd9030, relid=1, 
parent=0x0) at relnode.c:251
#9  0x00691be3 in add_base_rels_to_query (root=0x1bd9030, jtnode=) at initsplan.c:121
#10 0x006924af in query_planner (root=, 
tlist=0x1d34fd8, qp_callback=0x693940 , 
qp_extra=0x7ffe0a077290) at planmain.c:138
#11 0x0069777e in grouping_planner (root=, 
inheritance_update=false, tuple_fraction=) at 
planner.c:1892
#12 0x00698ef7 in subquery_planner (glob=, 
parse=, parent_root=, 
hasRecursion=, tuple_fraction=0) at planner.c:966
#13 0x00699d97 in standard_planner (parse=0x1bd9778, cursorOptions=256, 
boundParams=) at planner.c:405
#14 0x00739d2a in pg_plan_query (querytree=, 
cursorOptions=, boundParams=) at 
postgres.c:808
#15 0x005a0bd6 in ExplainOneQuery (query=0x1bd9778, 
cursorOptions=, into=0x0, es=0x1bd8f58, 
queryString=0x1ae0fb0 "explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1;", 
params=0x0, queryEnv=0x0)
at explain.c:365
#16 0x005a0e5d in ExplainQuery (pstate=0x1bd8d80, stmt=0x1ae1fa0, 
queryString=0x1ae0fb0 "explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1;", 
params=0x0, queryEnv=0x0, dest=0x1bd8e90) at explain.c:254
#17 0x007408f2 in standard_ProcessUtility (pstmt=0x1ae2050, 
queryString=0x1ae0fb0 "explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1;", 

bulk typos

2018-03-31 Thread Justin Pryzby
I needed another distraction so bulk-checked for typos, limited to comments in
*.[ch].

I'm not passionate about this, but it serves the purpose of reducing the
overhead of fixing them individually.

Also I heard something here recently about ugly languages..
time find . -name '*.c' -print0 |xargs -r0 sed -s '/.*\/\*/!d; s///; :l; 
/\*\/.*/!{N;b l}; s///; s/.*/\L&/' |grep -Eo '[[:alpha:]]{3,}' |sort |uniq -c 
|sort -nr |awk '$1==1{print $2}' |grep -xFvf /usr/share/dict/words |less

If any of these are disputed or objectionable, I would summarily discard them,
as I'm sure I missed some and fixing every last typo wasn't really the ghoul.

Justin
diff --git a/contrib/pgcrypto/rijndael.c b/contrib/pgcrypto/rijndael.c
index 4c074ef..6938701 100644
--- a/contrib/pgcrypto/rijndael.c
+++ b/contrib/pgcrypto/rijndael.c
@@ -163,7 +163,7 @@ gen_tabs(void)
q;
 
/* log and power tables for GF(2**8) finite field with  */
-   /* 0x11b as modular polynomial - the simplest prmitive  */
+   /* 0x11b as modular polynomial - the simplest primitive */
/* root is 0x11, used here to generate the tables   */
 
for (i = 0, p = 1; i < 256; ++i)
diff --git a/src/backend/access/common/session.c 
b/src/backend/access/common/session.c
index 617c3e1..ffa7432 100644
--- a/src/backend/access/common/session.c
+++ b/src/backend/access/common/session.c
@@ -60,7 +60,7 @@ InitializeSession(void)
  * Initialize the per-session DSM segment if it isn't already initialized, and
  * return its handle so that worker processes can attach to it.
  *
- * Unlike the per-context DSM segment, this segement and its contents are
+ * Unlike the per-context DSM segment, this segment and its contents are
  * reused for future parallel queries.
  *
  * Return DSM_HANDLE_INVALID if a segment can't be allocated due to lack of
diff --git a/src/backend/access/nbtree/nbtinsert.c 
b/src/backend/access/nbtree/nbtinsert.c
index e85abcf..4011199 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -187,9 +187,9 @@ top:
_bt_relbuf(rel, buf);
 
/*
-* Something did not workout. Just forget about 
the cached
+* Something did not work out. Just forget 
about the cached
 * block and follow the normal path. It might 
be set again if
-* the conditions are favourble.
+* the conditions are favourable.
 */
RelationSetTargetBlock(rel, InvalidBlockNumber);
}
diff --git a/src/backend/executor/execProcnode.c 
b/src/backend/executor/execProcnode.c
index 43a27a9..a3fb449 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -409,7 +409,7 @@ ExecSetExecProcNode(PlanState *node, ExecProcNodeMtd 
function)
 * Add a wrapper around the ExecProcNode callback that checks stack 
depth
 * during the first execution and maybe adds an instrumentation
 * wrapper. When the callback is changed after execution has already 
begun
-* that means we'll superflously execute ExecProcNodeFirst, but that 
seems
+* that means we'll superfluously execute ExecProcNodeFirst, but that 
seems
 * ok.
 */
node->ExecProcNodeReal = function;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c 
b/src/backend/jit/llvm/llvmjit_expr.c
index 0d8c2bd..f37ff82 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1768,7 +1768,7 @@ llvm_compile_expr(ExprState *state)

b_compare_result,
b_null);
 
-   /* build block analying the !NULL 
comparator result */
+   /* build block analyzing the !NULL 
comparator result */
LLVMPositionBuilderAtEnd(b, 
b_compare_result);
 
/* if results equal, compare next, 
otherwise done */
diff --git a/src/backend/optimizer/geqo/geqo_misc.c 
b/src/backend/optimizer/geqo/geqo_misc.c
index 919d288..0f96912 100644
--- a/src/backend/optimizer/geqo/geqo_misc.c
+++ b/src/backend/optimizer/geqo/geqo_misc.c
@@ -92,7 +92,7 @@ print_gen(FILE *fp, Pool *pool, int generation)
 {
int lowest;
 
-   /* Get index to lowest ranking gene in poplulation. */
+   /* Get index to lowest ranking gene in population. */
/* Use 2nd to last since last is buffer. */
lowest = pool->size > 1 ? pool->size - 2 : 0;
 
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 

Re: [HACKERS] per-sesson errors after interrupting CLUSTER pg_attribute (not attrdef)

2018-03-20 Thread Justin Pryzby
On Tue, Mar 20, 2018 at 01:03:57PM -0500, Justin Pryzby wrote:
> On Tue, Oct 24, 2017 at 04:56:27PM -0700, Michael Paquier wrote:
> > On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > > This was briefly scary but seems to have been limited to my psql session 
> > > (no
> > > other errors logged).  Issue with catcache (?)
> 
> Now I know, this is still an issue under PG10.2:
> 
> [pryzbyj@united-telsasoft ~]$ psql ts
> psql (10.2)
> ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
> INFO:  clustering "pg_catalog.pg_attribute" using sequential scan and sort
> INFO:  "pg_attribute": found 18102 removable, 2488511 nonremovable row 
> versions in 80769 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> CPU: user: 10.00 s, system: 1.59 s, elapsed: 27.60 s.
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
> ERROR:  could not open file "base/16400/948150297": No such file or directory

> I don't know much but..is this related to pg_filenode.map?

I gather this is the "prefer that it not happen" case here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/cache/relmapper.c;hb=HEAD#l449

So I think this is a known, accepted/expected behavior and no issue.

Justin



Re: [HACKERS] per-sesson errors after interrupting CLUSTER pg_attribute (not attrdef)

2018-03-20 Thread Justin Pryzby
On Tue, Oct 24, 2017 at 04:56:27PM -0700, Michael Paquier wrote:
> On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > This was briefly scary but seems to have been limited to my psql session (no
> > other errors logged).  Issue with catcache (?)
> >
> > I realized that the backup job I'd kicked off was precluding the CLUSTER 
> > from
> > running, but that CLUSTER was still holding lock and stalling everything 
> > else
> > under the sun.
> >
> > ts=# \df qci_add*
> > ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 
> > of 8192 bytes
> > ts=# \dt+ pg_att
> >
> > ts=# \dt+ pg_attrdef
> > ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 
> > of 8192 bytes
> 
> Perhaps that's too late, but to which relation is this relfilenode
> actually referring to?

Sorry for the late response :)

..but I've had mixed success reproducign this, and wasn't sure if it even an
issue under current postgres (the original issue was a 9.6 server+10.0client).

Now I know, this is still an issue under PG10.2:

[pryzbyj@united-telsasoft ~]$ psql ts
psql (10.2)
Type "help" for help.

ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO:  clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO:  "pg_attribute": found 18074 removable, 2488511 nonremovable row versions 
in 80769 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 9.66 s, system: 1.92 s, elapsed: 22.29 s.
^CCancel request sent
ERROR:  canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO:  clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO:  "pg_attribute": found 18102 removable, 2488511 nonremovable row versions 
in 80769 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 10.00 s, system: 1.59 s, elapsed: 27.60 s.
^CCancel request sent
ERROR:  canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
ERROR:  could not open file "base/16400/948150297": No such file or directory
ts=# SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;
ERROR:  column "pg_relation_filenode" does not exist
LINE 1: SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150...
 ^
ts=# SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;
ERROR:  function pg_relation_filenode() does not exist
LINE 1: SELECT * FROM pg_attribute WHERE pg_relation_filenode()=9481...
 ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
ts=# SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;
ERROR:  could not open file "base/16400/948150297": No such file or directory
ts=# 

In another session:
ts=# SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;
 relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode 
| reltablespace | relpages | reltuples | relallvisible | reltoastrelid | 
relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | 
relhasoids | rel
haspkey | relhasrules | relhastriggers | relhassubclass | relrowsecurity | 
relforcerowsecurity | relispopulated | relreplident | relispartition | 
relfrozenxid | relminmxid | relacl | reloptions | relpartbound
-+--+-+---+--+---+-+---+--+---+---+---+-+-++-+--+---++
+-++++-++--++--++++--
(0 rows)

postgres=# SELECT session_line, message, query FROM 
postgres_log_2018_03_20_1200 WHERE pid=29855 AND session_id='5ab147d9.749f' 
ORDER BY 1;
session_line|message|query
1|statement: SELECT pg_catalog.quote_ident(c2.relname)   FROM 
pg_catalog.pg_class c1, pg_catalog.pg_class c2, pg_catalog.pg_index i WHERE 
c1.oid=i.indrelid and i.indexrelid=c2.oid   and (0 = pg_catalog.length('')) 
  and pg_catalog.quote_ident(c1.relname)='pg_attribute'   and 
pg_catalog.pg_table_is_visible(c2.oid)
LIMIT 1000|
2|statement: SELECT pg_catalog.quote_ident(c2.relname)   FROM 
pg_catalog.pg_class c1, pg_catalog.pg_class c2, pg_catalog.pg_index i WHERE 
c1.oid=i.indrelid and i.indexrelid=c2.oid   and (24 = 
pg_catalog.length('pg_attribute_relid_attnu'))   and 
pg_catalog.quote_ident(c1.relname)='pg_attribute'   and 
pg_catalog.pg_table_is_v

Re: Function to track shmem reinit time

2018-03-03 Thread Justin Pryzby
On Sat, Mar 03, 2018 at 01:00:52PM -0500, Peter Eisentraut wrote:
> On 2/28/18 07:11, Anastasia Lubennikova wrote:
> > Currently, if the 'restart_after_crash' option is on, postgres will just
> > restart.
> > And the only way to know that it happened is to regularly parse logfile
> > or monitor it, catching restart messages. This approach is really
> > inconvenient for
> > users, who have gigabytes of logs.
> 
> I find this premise a bit dubious.  Why have a log file if it's too big
> to find anything in it?  Server crashes aren't the only thing people are
> interested in.  So we'll need a function for "last $anything".

I think one can tell if it's crashed recently by comparing start time of parent
postmaster and its main children (I'm going to go put this in place for myself
now).

ts=# SELECT backend_type, backend_start, pg_postmaster_start_time(), 
backend_start-pg_postmaster_start_time() FROM pg_stat_activity ORDER BY 
backend_start LIMIT 1;
backend_type | backend_start |   
pg_postmaster_start_time|?column? 
-+---+---+-
 autovacuum launcher | 2018-03-02 00:21:11.604468-03 | 2018-03-02 
00:12:46.757642-03 | 00:08:24.846826

pryzbyj@pryzbyj:~$ ps -O lstart -upostgres # on a separate server with 9.4 
which doen't have backend_type, add in v10
  PID  STARTED S TTY  TIME COMMAND
12982 Mon Feb 19 06:03:13 2018 S ?00:00:33 
/usr/lib/postgresql/9.4/bin/postgres -D /var/lib/postgresql/9.4/main -c 
config_file=/etc/postgresql/9.4/main/postgresql.conf
12997 Mon Feb 19 06:03:14 2018 S ?00:00:00 postgres: checkpointer 
process 
 
12998 Mon Feb 19 06:03:14 2018 S ?00:00:08 postgres: writer process 

   
12999 Mon Feb 19 06:03:14 2018 S ?00:00:08 postgres: wal writer process 

   
13000 Mon Feb 19 06:03:14 2018 S ?00:00:19 postgres: autovacuum 
launcher process
   
13001 Mon Feb 19 06:03:14 2018 S ?00:00:45 postgres: stats collector 
process 
  

pryzbyj@pryzbyj:~$ sudo kill -SEGV 12997
pryzbyj@pryzbyj:~$ ps -O lstart -upostgres
  PID  STARTED S TTY  TIME COMMAND
 8644 Sat Mar  3 12:05:34 2018 S ?00:00:00 postgres: checkpointer 
process 
 
 8645 Sat Mar  3 12:05:34 2018 S ?00:00:00 postgres: writer process 

   
 8646 Sat Mar  3 12:05:34 2018 S ?00:00:00 postgres: wal writer process 

   
 8647 Sat Mar  3 12:05:34 2018 S ?00:00:00 postgres: autovacuum 
launcher process
   
 8648 Sat Mar  3 12:05:34 2018 S ?00:00:00 postgres: stats collector 
process 
  
12982 Mon Feb 19 06:03:13 2018 S ?00:00:33 
/usr/lib/postgresql/9.4/bin/postgres -D /var/lib/postgresql/9.4/main -c 
config_file=/etc/postgresql/9.4/main/postgresql.conf

Justin



Re: Typo in src/backend/access/hash/README

2018-03-04 Thread Justin Pryzby
On Mon, Mar 05, 2018 at 04:30:28PM +1300, Thomas Munro wrote:
> Hi,
> 
> Not sure what word was missed here, but I guess "count":
> 
>  our the number of buckets stored in our cached copy of the metapage.  If
> -so, the bucket has certainly been split, because the must originally
> +so, the bucket has certainly been split, because the count must originally
>  have been less than the number of buckets that existed at that time and

I think there's a 2nd typo:

|After computing the ostensibly-correct bucket number based on our cached
|copy of the metapage, we lock the corresponding primary bucket page and
|check whether the bucket count stored in hasho_prevblkno is greater than
|OUR the number of buckets stored in our cached copy of the metapage.  If

remove "our"?

> --94eb2c0d4c02fe3e960566a1f43e
> Content-Type: application/octet-stream; 
> name="somebody-accidentally-a-word.patch"

:)

Justin



Re: RFC: Add 'taint' field to pg_control.

2018-03-01 Thread Justin Pryzby
On Thu, Mar 01, 2018 at 09:12:18AM +0800, Craig Ringer wrote:
> On 1 March 2018 at 06:28, Justin Pryzby <pry...@telsasoft.com> wrote:
> > The more fine grained these are the more useful they can be:
> >
> > Running with fsync=off is common advice while loading, so reporting that
> > "fsync=off at some point" is much less useful than reporting "having to run
> > recovery from an fsync=off database".
> >
> > I propose there could be two bits:
> >
> >  1. fsync was off at some point in history when the DB needed recovery;
> >  2. fsync was off during the current instance; I assume the bit would be set
> > whenever fsync=off was set, but could be cleared when the server was
> > cleanly shut down.  If the bit is set during recovery, the first bit 
> > would
> > be permanently set.  Not sure but if this bit is set and then fsync
> > re-enabled, maybe this could be cleared after any checkpoint and not 
> > just
> > shutdown ?
> >
> I think that's a very useful way to eliminate false positives and make this
> diagnostic field more useful. But we'd need to guarantee that when you've
> switched from fsync=off to fsync=on, we've actually fsync'd every extent of
> every table and index, whether or not we think they're currently dirty and
> whether or not the smgr currently has them open. Do something like initdb
> does to flush everything. Without that, we could have WAL-vs-heap ordering
> issues and so on *even after a Pg restart* if the system has lots of dirty
> writeback to do.


I think a similar, "2 bit" scheme *could* be applied to full_page_writes, too.
And the 2nd "soft" bit for FPW could also be cleared after checkpoint.  I'd
think that for FPW, previous checkpoints which had fsync=off wouldn't need
special handling - so long as FPW=on, the "soft" bit can be cleared after
successful checkpoint (as a micro-optimization, don't clear the "soft" bit if
the "hard" bit is set).

BTW this scheme has the amusing consequence that setting fsync=off should
involve first writing+fsyncing these bits to pgcontrol.

Justin



\d+ fails on index on partition

2018-09-27 Thread Justin Pryzby
For indices inherited from relkind=p:

pryzbyj=# CREATE TABLE tt(i int)PARTITION BY RANGE(i);
pryzbyj=# CREATE TABLE tt1 PARTITION OF tt DEFAULT;
pryzbyj=# CREATE INDEX ON tt((0+i));
pryzbyj=# ALTER INDEX tt1_expr_idx ALTER COLUMN 1 SET STATISTICS 123;
pryzbyj=# \d+ tt1_expr_idx 
ERROR:  42809: "tt1_expr_idx" is an index
LOCATION:  heap_open, heapam.c:1305

Failing due to:
* QUERY **
SELECT inhparent::pg_catalog.regclass,
  pg_catalog.pg_get_expr(c.relpartbound, inhrelid),
  pg_catalog.pg_get_partition_constraintdef(inhrelid)
FROM pg_catalog.pg_class c JOIN pg_catalog.pg_inherits i ON c.oid = inhrelid
WHERE c.oid = '40129092' AND c.relispartition;
**

pg_get_partition_constraintdef() doesn't like being passed a relkind='I',
"ATTACH"ed index, which I guess is included in pg_inherit (commit 8b08f7d48):

pryzbyj=# SELECT inhparent::regclass, inhrelid::regclass, * FROM pg_inherits 
WHERE inhrelid='tt1_expr_idx'::regclass;
inhparent | tt_expr_idx
inhrelid  | tt1_expr_idx
inhrelid  | 40129092
inhparent | 40129091
inhseqno  | 1

I'll spare you the burden of any fix I might attempt to cobble together.

Justin



rowcount estimate varies WRT partitionwise_join

2018-10-14 Thread Justin Pryzby
I was crosseyed yesterday due to merge conflicts, but this still seems odd.

I thought that final row counts would not vary with the details of the chosen
plan.  Which seems to hold true when I disable parallel join or hash join, but
not for PWJ.

I noticed this behavior while joining our own tables using eq join on the
partition key plus an inequality comparison also on the partition key (range),
but I see the same thing using tables from the regression test:

pryzbyj=# EXPLAIN SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE 
t1.a = t2.b AND t1.b = 0 ;
  QUERY PLAN  
--
 Hash Join  (cost=6.96..13.83 rows=12 width=18)
   Hash Cond: (t2.b = t1.a)
   ->  Append  (cost=0.00..6.00 rows=200 width=9)
 ->  Seq Scan on prt2_p1 t2  (cost=0.00..1.84 rows=84 width=9)
 ->  Seq Scan on prt2_p2 t2_1  (cost=0.00..1.83 rows=83 width=9)
 ->  Seq Scan on prt2_p3 t2_2  (cost=0.00..1.33 rows=33 width=9)
   ->  Hash  (cost=6.81..6.81 rows=12 width=9)
 ->  Append  (cost=0.00..6.81 rows=12 width=9)
   ->  Seq Scan on prt1_p1 t1  (cost=0.00..2.56 rows=5 width=9)
 Filter: (b = 0)
   ->  Seq Scan on prt1_p2 t1_1  (cost=0.00..2.56 rows=5 width=9)
 Filter: (b = 0)
   ->  Seq Scan on prt1_p3 t1_2  (cost=0.00..1.62 rows=2 width=9)
 Filter: (b = 0)

pryzbyj=# SET enable_partitionwise_join=on;
pryzbyj=# EXPLAIN SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE 
t1.a = t2.b AND t1.b = 0 ;
  QUERY PLAN  
--
 Append  (cost=2.62..12.75 rows=7 width=18)
   ->  Hash Join  (cost=2.62..4.81 rows=3 width=18)
 Hash Cond: (t2.b = t1.a)
 ->  Seq Scan on prt2_p1 t2  (cost=0.00..1.84 rows=84 width=9)
 ->  Hash  (cost=2.56..2.56 rows=5 width=9)
   ->  Seq Scan on prt1_p1 t1  (cost=0.00..2.56 rows=5 width=9)
 Filter: (b = 0)
   ->  Hash Join  (cost=2.62..4.80 rows=3 width=18)
 Hash Cond: (t2_1.b = t1_1.a)
 ->  Seq Scan on prt2_p2 t2_1  (cost=0.00..1.83 rows=83 width=9)
 ->  Hash  (cost=2.56..2.56 rows=5 width=9)
   ->  Seq Scan on prt1_p2 t1_1  (cost=0.00..2.56 rows=5 width=9)
 Filter: (b = 0)
   ->  Hash Join  (cost=1.65..3.11 rows=1 width=18)
 Hash Cond: (t2_2.b = t1_2.a)
 ->  Seq Scan on prt2_p3 t2_2  (cost=0.00..1.33 rows=33 width=9)
 ->  Hash  (cost=1.62..1.62 rows=2 width=9)
   ->  Seq Scan on prt1_p3 t1_2  (cost=0.00..1.62 rows=2 width=9)
 Filter: (b = 0)



pg11rc1 DROP INDEX: NOTICE: drop_trigger

2018-10-15 Thread Justin Pryzby
ts=#  CREATE INDEX ON t(i) ;
CREATE INDEX
ts=#  DROP INDEX t_i_idx ;
NOTICE:  0: drop_trigger
LOCATION:  exec_stmt_raise, pl_exec.c:3748
DROP INDEX
ts=# 

ts=# \d t
 Table "public.t"
 Column |  Type   | Collation | Nullable | Default 
+-+---+--+-
 i  | integer |   |  | 

ts=# SELECT COUNT(1) FROM pg_trigger WHERE tgrelid='t'::regclass;
count | 0

It's not specific to the index name, since it can happen multiple times for
each index on that table:

ts=# DROP index t_i_idx2;
NOTICE:  drop_trigger
DROP INDEX
ts=# DROP index t_i_idx1;
NOTICE:  drop_trigger
DROP INDEX
ts=# 

I looked in pg_depend, but didn't see any trigger with dependency (but maybe
not doing it right):

ts=# SELECT objid::regclass, * FROM pg_depend WHERE 
refobjid='t_i_idx'::regclass;
(0 rows)

This doesn't occur on new tables, just this one..

"ts" is an unused database on our development VM which which I found myself
connecting to by habbit (it's the name of most of our customer DBs).  I
anticipate this may be result of catalog cruftiness resulting from upgrades
from 11dev through each of the betas.

Specifically this would've been the instance being upgraded when I wrote:
https://www.postgresql.org/message-id/flat/20180529171015.GA22903%40telsasoft.com

I don't know what else to look at, but wonder if there's any use in debugging
this further ?

Justin



Re: pg11rc1 DROP INDEX: NOTICE: drop_trigger

2018-10-15 Thread Justin Pryzby
On Mon, Oct 15, 2018 at 03:44:52PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > ts=#  CREATE INDEX ON t(i) ;
> > CREATE INDEX
> > ts=#  DROP INDEX t_i_idx ;
> > NOTICE:  0: drop_trigger
> > LOCATION:  exec_stmt_raise, pl_exec.c:3748
> > DROP INDEX
> 
> Maybe you've got an event trigger hanging around in that DB?

You're right; I was following along this bug report:
https://www.postgresql.org/message-id/flat/20180924103001.GK2471%40telsasoft.com#b066078913137fe462c24c0884d186e1

Thanks, all's well.

Justin



Re: fine tune v11 release notes

2018-10-13 Thread Justin Pryzby
On Sat, Oct 13, 2018 at 04:34:07PM -0400, Tom Lane wrote:
> "Jonathan S. Katz"  writes:
> > On 10/6/18 9:42 AM, Justin Pryzby wrote:
> >> Find below various fixes to release notes for v11, for discussion purposes.
> 
> > Thanks for putting this together. I reviewed, broke it up into two
> > patches, and made some additions / changes / deletions. Please see
> > attached patches.
> 
> Hmm, I just saw this after spending an afternoon editing the release
> notes.  I'll try to merge it with what I did.

I had some more local changes too...I'll try to send an patch relative to my
original.  I'm sure Jonathan already caught most of the errors anyway.

Justin



Re: fine tune v11 release notes

2018-10-13 Thread Justin Pryzby
On Sat, Oct 13, 2018 at 03:38:39PM -0500, Justin Pryzby wrote:
> On Sat, Oct 13, 2018 at 04:34:07PM -0400, Tom Lane wrote:
> > "Jonathan S. Katz"  writes:
> > > On 10/6/18 9:42 AM, Justin Pryzby wrote:
> > >> Find below various fixes to release notes for v11, for discussion 
> > >> purposes.
> > 
> > > Thanks for putting this together. I reviewed, broke it up into two
> > > patches, and made some additions / changes / deletions. Please see
> > > attached patches.
> > 
> > Hmm, I just saw this after spending an afternoon editing the release
> > notes.  I'll try to merge it with what I did.
> 
> I had some more local changes too...I'll try to send an patch relative to my
> original.  I'm sure Jonathan already caught most of the errors anyway.

I'm attaching two patches.  They're redundant ; choose one.

1) fine-tune-jonathan-justin.patch: To be applied on top of Jonathan's patch to
release notes.  I believe he dropped a few of my changes, but I don't know
offhand which, so some of my changes should perhaps be removed.

I think at least these two additional hunks are important:
- B-tree indexes can now be built in parallel with
+ A B-tree index can now be built in parallel with

=> avoid confusion between "parallel" and "simultaneous".

-Allow UNIQUE indexes on partitioned tables if
-the partition key guarantees uniqueness (lvaro Herrera,
+Allow UNIQUE indexes on partitioned tables when
+the partition key is included in the index key (lvaro Herrera,

=> Needed for correctness IMHO.

2) fine-tune-v2.patch: relative to my original.  Should be self explanatory.

Feel free to send your own patch and I'll resolve conflicts and resend proposed
changes.

Justin
diff --git a/doc/src/sgml/release-11.sgml b/doc/src/sgml/release-11.sgml
index 5df6a41..5594d43 100644
--- a/doc/src/sgml/release-11.sgml
+++ b/doc/src/sgml/release-11.sgml
@@ -49,8 +49,8 @@


 
- Having a "default" partition for storing data that does not match any
- of the remaining partitions
+ Allow creation of a "default" partition for storing rows which are
+ excluded by bounds of all other partitions.
 

   
@@ -63,7 +63,7 @@
   

 
- B-tree indexes can now be built in parallel with
+ A B-tree index can now be built in parallel with
  CREATE INDEX
 

@@ -118,9 +118,9 @@
 
 
  
-  Many other useful performance improvements, including a significant
-  speedup to the ALTER TABLE .. ADD COLUMN with a
-  non-null column default as it no longer rewrites the full table.
+  Many other useful performance improvements, including the ability to avoid
+  a table rewrite when using ALTER TABLE .. ADD COLUMN
+  with a non-null column default.
  
 
 
@@ -591,8 +591,8 @@
 -->
 

-Allow UNIQUE indexes on partitioned tables if
-the partition key guarantees uniqueness (lvaro Herrera,
+Allow UNIQUE indexes on partitioned tables when
+the partition key is included in the index key (lvaro Herrera,
 Amit Langote)

   
@@ -603,15 +603,18 @@
 -->
 

-Allow indexes on a partitioned table to be automatically created
-in any child partitions (lvaro Herrera)
+Support indexes on partitioned tables (lvaro Herrera)
+   
+
+   
+This is not a global index, but rather a mechanism to create indexes on
+on each partition of a partitioned table.  

 

 The new command ALTER
-INDEX ATTACH PARTITION allows indexes to be
-attached to partitions.  This does not behave as a global index
-since the contents are private to each index.
+INDEX ATTACH PARTITION allows an index on a partition
+to be inherited by the index on its partitioned tables.

   
 
@@ -654,7 +657,7 @@
 

 Creation of a trigger on partitioned tables automatically creates
-triggers on all existing and future partitions.
+triggers on all existing and any future partitions.
 This also allows deferred unique constraints on partitioned tables.

   
@@ -667,14 +670,16 @@
 -->
 

-Allow partitioned tables to be joined directly for equality joins with
-identically partitioned child tables.  (Ashutosh Bapat)
+Allow joins between partitioned tables to be performed directly
+on partitions.  (Ashutosh Bapat)

 

-This feature is disabled by default
-but can be enabled by changing enable_partitionwise_join.
+This applies to equality joins between partitioned tables whose
+partitions have exactly the same pa

backpatch to v11? Add "B" suffix for bytes to docs

2018-10-13 Thread Justin Pryzby
I suggest this should be backpatched ?

commit 36e9d413a1d6928cd809d302d495fd6880a44b1e
Author: Greg Stark 
Date:   Sat Oct 6 13:03:43 2018 -0400

Add "B" suffix for bytes to docs

6e7baa3227 and b06d8e58b5 added "B" as a valid suffix for
GUC_UNIT_BYTES but neglected to add it to the docs.

 doc/src/sgml/config.sgml | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)




Re: fine tune v11 release notes

2018-10-13 Thread Justin Pryzby
On Sat, Oct 13, 2018 at 05:31:46PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> >> On Sat, Oct 13, 2018 at 04:34:07PM -0400, Tom Lane wrote:
> >>> Hmm, I just saw this after spending an afternoon editing the release
> >>> notes.  I'll try to merge it with what I did.
> 
> > Feel free to send your own patch and I'll resolve conflicts and resend 
> > proposed
> > changes.
> 
> I've pushed updates merging Jonathan's last with my work.  Please
> look over what's in HEAD and see if you want to propose additional
> changes.

Find attached what's left.

Justin
diff --git a/doc/src/sgml/release-11.sgml b/doc/src/sgml/release-11.sgml
index 188222f..cdf41af 100644
--- a/doc/src/sgml/release-11.sgml
+++ b/doc/src/sgml/release-11.sgml
@@ -31,8 +31,8 @@


 
- Add support for PRIMARY KEY, FOREIGN
- KEY, indexes, and triggers on partitioned tables
+ Add support on partitioned tables for PRIMARY
+ KEY, FOREIGN KEY, indexes, and triggers
 


@@ -65,7 +65,7 @@

 
  CREATE INDEX can now use parallel processing
- while building B-tree indexes
+ while building a B-tree index
 


@@ -117,9 +117,9 @@
 
 
  
-  Many other useful performance improvements, including a significant
-  speedup to ALTER TABLE ... ADD COLUMN with a
-  non-null column default, as it no longer rewrites the table data
+  Many other useful performance improvements, including the ability to avoid
+  a table rewrite when using ALTER TABLE .. ADD COLUMN
+  with a non-null column default.
  
 
 
@@ -587,15 +587,18 @@
 -->
 

-Allow indexes on a partitioned table to be automatically created
-in new child partitions (lvaro Herrera)
+Support indexes on partitioned tables (lvaro Herrera)
+   
+
+   
+This is not a global index, but rather a mechanism to create an indexes
+on each partition of a partitioned table.  

 

 The new command ALTER
-INDEX ATTACH PARTITION allows indexes to be
-attached to partitions.  This does not behave as a global index
-since the contents are private to each index.
+INDEX ATTACH PARTITION allows an index on a partition
+to be inherited by the index on its partitioned table.

   
 
@@ -733,8 +736,8 @@
 -->
 

-Perform aggregation on each partition, and then merge the results
-(Jeevan Chalke, Ashutosh Bapat, Robert Haas)
+Allow aggregate functions on partitioned tables to be evaluated
+directly on partitions.  (Jeevan Chalke, Ashutosh Bapat, Robert Haas)

 

@@ -1101,8 +1104,8 @@ same commits as above
 -->
 

-Improve optimizer's row count estimates for EXISTS
-and NOT EXISTS queries (Tom Lane)
+Improve row count estimates for EXISTS and
+NOT EXISTS queries (Tom Lane)

   
 
@@ -1146,7 +1149,7 @@ same commits as above
 

 This feature requires LLVM to be
-available.  It is not currently enabled by default, even in
+available.  It is disabled by default, even in
 builds that support it.

   
@@ -1564,11 +1567,11 @@ same commits as above
 

 Allow server options related to memory and file sizes to be
-specified as a number of bytes (Beena Emerson)
+specified in units of bytes (Beena Emerson)

 

-The new unit is B.  This is in addition to the
+The new suffix is B.  This is in addition to the
 existing units kB, MB, GB
 and TB.

@@ -1605,7 +1608,7 @@ same commits as above
 -->
 

-Retain WAL data for only a single checkpoint
+Retain WAL data for a single checkpoint, only.
 (Simon Riggs)

 
@@ -1801,6 +1804,7 @@ same commits as above

 

+DOES THIS NEED TO BE IN "COMPATIBILITY NOTES"?
 Also, if any table mentioned in VACUUM uses
 a column list, then the ANALYZE keyword must be
 supplied; previously, ANALYZE was implied in
@@ -2199,8 +2203,7 @@ same commits as above
 

 Add psql command \gdesc
-to display the column names and types of the query output (Pavel
-Stehule)
+to display the names and types of the columns in query results (Pavel Stehule)

   
 


Re: pg11.1 jit segv

2018-11-15 Thread Justin Pryzby
On Thu, Nov 15, 2018 at 06:03:34PM -0600, Justin Pryzby wrote:
> Verbose plan, munged for brevity/sanity due to joining wide tables, and
> redacted since the view probably has to be considered proprietary.  Hopefully
> the remaining bits are still useful.  I replaced column names with x.

Actually the view isn't as intricate as I thought, but I'd like to avoid
publishing it for sake of simplicity.  I replaced the view with its underlying
table and now I get:

[pryzbyj@database ~]$ time psql ts -f tmp/sql-jit-crash-2018-11-15.jtp 
psql:tmp/sql-jit-crash-2018-11-15.jtp:12: ERROR:  invalid memory alloc request 
size 2447889552

It's unclear if that's a useful hint, a separate problem, or a nonissue..

Justin



pg11.1 jit segv

2018-11-15 Thread Justin Pryzby
Crash is reproducible but only when JIT=on.

postgresql11-llvmjit-11.1-1PGDG.rhel7.x86_64

[2769871.453033] postmaster[8582]: segfault at 7f083bddb780 ip 7f08127e814e 
sp 7ffe463506e0 error 4
[2770774.470600] postmaster[29410]: segfault at 7f0812eeb6c8 ip 
7f08127eb4f0 sp 7ffe463506e0 error 4

Core was generated by `postgres: telsasoft ts 192.168.122.11(41908) SELECT'.
Program terminated with signal 11, Segmentation fault.

(gdb) bt
#0  0x7f08127e814e in ?? ()
#1  0x in ?? ()

[pryzbyj@database ~]$ sudo -u postgres valgrind /usr/pgsql-11/bin/postgres 
--single -D /var/lib/pgsql/11/data ts &, llvm::DataLayout const&, llvm::Value*, 
llvm::APInt, llvm::Type*, llvm::Twine) (SROA.cpp:1531)
==26448==by 0x1B511C52: 
llvm::sroa::AllocaSliceRewriter::getNewAllocaSlicePtr(llvm::IRBuilder&, llvm::Type*) 
(SROA.cpp:2313)
==26448==by 0x1B516BA0: 
llvm::sroa::AllocaSliceRewriter::visitIntrinsicInst(llvm::IntrinsicInst&) 
(SROA.cpp:2921)
==26448==by 0x1B5190CC: visitCall (Instruction.def:190)
==26448==by 0x1B5190CC: llvm::InstVisitor::visit(llvm::Instruction&) (Instruction.def:190)
==26448==by 0x1B51947A: visit (InstVisitor.h:114)
==26448==by 0x1B51947A: llvm::sroa::AllocaSliceRewriter::visit((anonymous 
namespace)::Slice const*) (SROA.cpp:2262)
==26448==by 0x1B51D426: llvm::SROA::rewritePartition(llvm::AllocaInst&, 
llvm::sroa::AllocaSlices&, llvm::sroa::Partition&) (SROA.cpp:3921)
==26448==by 0x1B51E630: llvm::SROA::splitAlloca(llvm::AllocaInst&, 
llvm::sroa::AllocaSlices&) (SROA.cpp:4029)
==26448==by 0x1B51F25D: llvm::SROA::runOnAlloca(llvm::AllocaInst&) 
(SROA.cpp:4156)
==26448==by 0x1B52048A: llvm::SROA::runImpl(llvm::Function&, 
llvm::DominatorTree&, llvm::AssumptionCache&) (SROA.cpp:4243)
==26448==by 0x1B520E40: 
llvm::sroa::SROALegacyPass::runOnFunction(llvm::Function&) (SROA.cpp:4296)
==26448==by 0x1AC49C31: llvm::FPPassManager::runOnFunction(llvm::Function&) 
(LegacyPassManager.cpp:1514)
==26448==by 0x1B6A805E: RunPassOnSCC (Timer.h:149)
==26448==by 0x1B6A805E: RunAllPassesOnSCC (CallGraphSCCPass.cpp:419)
==26448==by 0x1B6A805E: (anonymous 
namespace)::CGPassManager::runOnModule(llvm::Module&) (CallGraphSCCPass.cpp:474)
==26448==
==26448== Conditional jump or move depends on uninitialised value(s)
==26448==at 0x1B510F09: 
getAdjustedPtr(llvm::IRBuilder&, llvm::DataLayout const&, llvm::Value*, 
llvm::APInt, llvm::Type*, llvm::Twine) (SROA.cpp:1531)
==26448==by 0x1B511C52: 
llvm::sroa::AllocaSliceRewriter::getNewAllocaSlicePtr(llvm::IRBuilder&, llvm::Type*) 
(SROA.cpp:2313)
==26448==by 0x1B515EF0: 
llvm::sroa::AllocaSliceRewriter::visitMemSetInst(llvm::MemSetInst&) 
(SROA.cpp:2656)
==26448==by 0x1B5190CC: visitCall (Instruction.def:190)
==26448==by 0x1B5190CC: llvm::InstVisitor::visit(llvm::Instruction&) (Instruction.def:190)
==26448==by 0x1B51947A: visit (InstVisitor.h:114)
==26448==by 0x1B51947A: llvm::sroa::AllocaSliceRewriter::visit((anonymous 
namespace)::Slice const*) (SROA.cpp:2262)
==26448==by 0x1B51D426: llvm::SROA::rewritePartition(llvm::AllocaInst&, 
llvm::sroa::AllocaSlices&, llvm::sroa::Partition&) (SROA.cpp:3921)
==26448==by 0x1B51E630: llvm::SROA::splitAlloca(llvm::AllocaInst&, 
llvm::sroa::AllocaSlices&) (SROA.cpp:4029)
==26448==by 0x1B51F25D: llvm::SROA::runOnAlloca(llvm::AllocaInst&) 
(SROA.cpp:4156)
==26448==by 0x1B52048A: llvm::SROA::runImpl(llvm::Function&, 
llvm::DominatorTree&, llvm::AssumptionCache&) (SROA.cpp:4243)
==26448==by 0x1B520E40: 
llvm::sroa::SROALegacyPass::runOnFunction(llvm::Function&) (SROA.cpp:4296)
==26448==by 0x1AC49C31: llvm::FPPassManager::runOnFunction(llvm::Function&) 
(LegacyPassManager.cpp:1514)
==26448==by 0x1B6A805E: RunPassOnSCC (Timer.h:149)
==26448==by 0x1B6A805E: RunAllPassesOnSCC (CallGraphSCCPass.cpp:419)
==26448==by 0x1B6A805E: (anonymous 
namespace)::CGPassManager::runOnModule(llvm::Module&) (CallGraphSCCPass.cpp:474)
==26448==

==26448== Invalid write of size 8
==26448==at 0x4C2E0C3: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
==26448==by 0x824075: UnknownInlinedFun (string3.h:51)
==26448==by 0x824075: varstrfastcmp_locale (varlena.c:2135)
==26448==by 0x87C6F3: ApplySortComparator (sortsupport.h:224)
==26448==by 0x87C6F3: comparetup_heap (tuplesort.c:3560)
==26448==by 0x8793F4: qsort_tuple (qsort_tuple.c:112)
==26448==by 0x87EE53: tuplesort_performsort (tuplesort.c:1811)
==26448==by 0x628CE2: ExecSort (nodeSort.c:118)
==26448==by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==by 0x610D48: UnknownInlinedFun (executor.h:237)
==26448==by 0x610D48: fetch_input_tuple (nodeAgg.c:406)
==26448==by 0x61277F: agg_retrieve_direct (nodeAgg.c:1736)
==26448==by 0x61277F: ExecAgg (nodeAgg.c:1551)
==26448==by 0x609067: ExecProcNodeInstr (execProcnode.c:461)
==26448==by 0x60A764: ExecScanFetch (execScan.c:95)
==26448==by 0x60A764: 

Re: pg11.1 jit segv

2018-11-15 Thread Justin Pryzby
On Thu, Nov 15, 2018 at 02:47:55PM -0800, Andres Freund wrote:
> > (gdb) bt
> > #0  0x7f08127e814e in ?? ()
> > #1  0x in ?? ()
> 
> Could you enable jit_debugging_support and reproduce?  That should give
> a more useful backtrace.

Core was generated by `postgres: pryzbyj ts [local] EXPLAIN   '.
Program terminated with signal 11, Segmentation fault.
#0  0x7f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
audit-libs-2.8.1-3.el7_5.1.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 
cyrus-sasl-lib-2.1.26-23.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64 
elfutils-libs-0.170-4.el7.x86_64 glib4
(gdb) bt
#0  0x7f819e227cb9 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x00824076 in memcpy (__len=18446744073709551612, __src=0x6c35818, 
__dest=) at /usr/include/bits/string3.h:51
#2  varstrfastcmp_locale (x=113465364, y=113465636, ssup=) at 
varlena.c:2135
#3  0x0087c6f4 in ApplySortComparator (ssup=0x6c33c40, 
isNull2=, datum2=, isNull1=false, 
datum1=) at ../../../../src/include/utils/sortsupport.h:224
#4  comparetup_heap (a=, b=, state=0x6c337f0) at 
tuplesort.c:3560
#5  0x008793f5 in qsort_tuple (a=0x8311958, n=5272, cmp_tuple=0x87c2d0 
, state=state@entry=0x6c337f0) at qsort_tuple.c:112
#6  0x0087d2db in tuplesort_sort_memtuples 
(state=state@entry=0x6c337f0) at tuplesort.c:3320
#7  0x0087ee54 in tuplesort_performsort (state=state@entry=0x6c337f0) 
at tuplesort.c:1811
#8  0x00628ce3 in ExecSort (pstate=0x5e50ed8) at nodeSort.c:118
#9  0x00609068 in ExecProcNodeInstr (node=0x5e50ed8) at 
execProcnode.c:461
#10 0x00610d49 in ExecProcNode (node=0x5e50ed8) at 
../../../src/include/executor/executor.h:237
#11 fetch_input_tuple (aggstate=aggstate@entry=0x5e50aa0) at nodeAgg.c:406
#12 0x00612780 in agg_retrieve_direct (aggstate=0x5e50aa0) at 
nodeAgg.c:1736
#13 ExecAgg (pstate=0x5e50aa0) at nodeAgg.c:1551
#14 0x00609068 in ExecProcNodeInstr (node=0x5e50aa0) at 
execProcnode.c:461
#15 0x0060a765 in ExecScanFetch (recheckMtd=0x62be50 , 
accessMtd=0x62be70 , node=0x5e508e0) at execScan.c:95
#16 ExecScan (node=0x5e508e0, accessMtd=0x62be70 , 
recheckMtd=0x62be50 ) at execScan.c:162
#17 0x00609068 in ExecProcNodeInstr (node=0x5e508e0) at 
execProcnode.c:461
#18 0x0061b365 in ExecProcNode (node=0x5e508e0) at 
../../../src/include/executor/executor.h:237
#19 MultiExecPrivateHash (node=0x5e50720) at nodeHash.c:164
#20 MultiExecHash (node=node@entry=0x5e50720) at nodeHash.c:114
#21 0x00609610 in MultiExecProcNode (node=node@entry=0x5e50720) at 
execProcnode.c:501
#22 0x0061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x5e4f620) 
at nodeHashjoin.c:290
#23 ExecHashJoin (pstate=0x5e4f620) at nodeHashjoin.c:565
#24 0x00609068 in ExecProcNodeInstr (node=0x5e4f620) at 
execProcnode.c:461
#25 0x0061b365 in ExecProcNode (node=0x5e4f620) at 
../../../src/include/executor/executor.h:237
#26 MultiExecPrivateHash (node=0x5e4f460) at nodeHash.c:164
#27 MultiExecHash (node=node@entry=0x5e4f460) at nodeHash.c:114
#28 0x00609610 in MultiExecProcNode (node=node@entry=0x5e4f460) at 
execProcnode.c:501
#29 0x0061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x38344f8) 
at nodeHashjoin.c:290
#30 ExecHashJoin (pstate=0x38344f8) at nodeHashjoin.c:565
#31 0x00609068 in ExecProcNodeInstr (node=0x38344f8) at 
execProcnode.c:461
#32 0x0061b365 in ExecProcNode (node=0x38344f8) at 
../../../src/include/executor/executor.h:237
#33 MultiExecPrivateHash (node=0x38343c8) at nodeHash.c:164
#34 MultiExecHash (node=node@entry=0x38343c8) at nodeHash.c:114
#35 0x00609610 in MultiExecProcNode (node=node@entry=0x38343c8) at 
execProcnode.c:501
#36 0x0061bd98 in ExecHashJoinImpl (parallel=false, pstate=0x3833148) 
at nodeHashjoin.c:290
#37 ExecHashJoin (pstate=0x3833148) at nodeHashjoin.c:565
#38 0x00609068 in ExecProcNodeInstr (node=0x3833148) at 
execProcnode.c:461
#39 0x00628cd6 in ExecProcNode (node=0x3833148) at 
../../../src/include/executor/executor.h:237
#40 ExecSort (pstate=0x3832bd0) at nodeSort.c:107
#41 0x00609068 in ExecProcNodeInstr (node=0x3832bd0) at 
execProcnode.c:461
#42 0x00610d49 in ExecProcNode (node=0x3832bd0) at 
../../../src/include/executor/executor.h:237
#43 fetch_input_tuple (aggstate=aggstate@entry=0x3832e30) at nodeAgg.c:406
#44 0x00612780 in agg_retrieve_direct (aggstate=0x3832e30) at 
nodeAgg.c:1736
#45 ExecAgg (pstate=0x3832e30) at nodeAgg.c:1551
#46 0x00609068 in ExecProcNodeInstr (node=0x3832e30) at 
execProcnode.c:461
#47 0x006025ba in ExecProcNode (node=0x3832e30) at 
../../../src/include/executor/executor.h:237
#48 ExecutePlan (execute_once=, dest=0xced3a0 , 
direction=, numberTuples=0, sendTuples=true, 
operation=CMD_SELECT, use_parallel_mode=, planstate=0x3832e30,

Re: pg11.1 jit segv

2018-11-15 Thread Justin Pryzby
On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> I'm about to commit some changes to 12/master that'd possibly make it
> easier to find issues like this. Is there any chance that it's easier to
> repro this on master than making a reproducible test case?

Yes, very possibly - this is on a customer's server, so their data, plus views,
etc are potentially involved until excluded.

Justin



Re: pg11.1 jit segv

2018-11-15 Thread Justin Pryzby
On Thu, Nov 15, 2018 at 03:14:01PM -0800, Andres Freund wrote:
> Huh, that's the same crash? Because I don't see any evalexpr functions
> in the stack, and without those the above bt should have worked...

TTBOMK it's the same ..

Is it odd if i'm seeing this: (odd because of "ANOTHER") ?
I guess that's maybe because it's running parallel query ?
psql:tmp/sql-jit-crash-2018-11-15.jtp:16: WARNING:  terminating connection 
because of crash of another server process

I'm started trying to minimize the query.  Here's what's left:

SELECT * FROM
daily_eric_umts_rnc_utrancell_view t2
JOIN (SELECT * FROM eric_umts_rbs_sector_carrier_view t1 WHERE t1.start_time >= 
'2018-07-01 00:00:00' AND t1.start_time < '2018-07-09 00:00:00') AS t1
USING(start_time)
WHERE (t2.start_time >= '2018-07-01 00:00:00' AND t2.start_time < '2018-07-09 
00:00:00')

Verbose plan, munged for brevity/sanity due to joining wide tables, and
redacted since the view probably has to be considered proprietary.  Hopefully
the remaining bits are still useful.  I replaced column names with x.

 Gather  (cost=481961.60..482102.41 rows=1180 width=8068)
   Output: [ there are 1700+ columns here... ]
   Workers Planned: 3
   ->  Merge Join  (cost=480961.60..480984.41 rows=381 width=8068)
 Output:  ARRAY[x, x, ..., COALESCE(x, 0), COALESCE(x, 0)], ARRAY[..., 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COA
LESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALE
SCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESC
E(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0), 
COALESCE(x, 0), COALESCE(x, 0), COALESCE(x, 0)], ((x)::numeric / 10.0), CASE 
WHEN (x IS NULL) THEN ARRAY[..., 
eric_umts_rbs_sector_carrier_201807.pmaveragerssi_064] ELSE ARRAY[...] END, 
CASE WHEN (x IS NULL) THEN 
'{-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-60.0}'::numeric[]
 ELSE 
'{-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-x,-39.5}'::numeric[]
 END, ARRAY[x], x, x
 Merge Cond: (eric_umts_rbs_sector_carrier_201807.start_time = 
t2.start_time)
 ->  Sort  (cost=480421.29..480428.89 rows=3039 width=1652)
   Output: [...]
   Sort Key: eric_umts_rbs_sector_carrier_201807.start_time
   ->  Parallel Append  (cost=0.00..480245.49 rows=3039 width=1652)
 ->  Parallel Seq Scan on 
child.eric_umts_rbs_sector_carrier_201807  (cost=0.00..480230.30 rows=3039 
width=1652)
   Output: [...]
   Filter: 
((eric_umts_rbs_sector_carrier_201807.start_time >= '2018-07-01 
00:00:00-07'::timestamp with time zone) AND 
(eric_umts_rbs_sector_carrier_201807.start_time < '2018-07-09 
00:00:00-07'::timesta
mp with time zone))
 ->  Sort  (cost=540.31..540.65 rows=139 width=7760)
   Output: [...]
   Sort Key: t2.start_time
   ->  Append  (cost=0.00..535.36 rows=139 width=7760)
 ->  Seq Scan on public.daily_eric_umts_rnc_utrancell_view 
t2  (cost=0.00..0.00 rows=1 width=7760)
   Output: [...]
   Filter: ((t2.start_time >= '2018-07-01 
00:00:00'::timestamp without time zone) AND (t2.start_time < '2018-07-09 
00:00:00'::timestamp without time zone))
 ->  Bitmap Heap Scan on 
child.daily_eric_umts_rnc_utrancell_view_201807 t2_1  (cost=9.83..534.66 
rows=138 width=7760)
   Output: [...]
   Recheck Cond: ((t2_1.start_time >= '2018-07-01 
00:00:00'::timestamp without time zone) AND (t2_1.start_time < '2018-07-09 
00:00:00'::timestamp without time zone)) 
   ->  Bitmap Index Scan on 
daily_eric_umts_rnc_utrancell_view_201807_unique_idx  (cost=0.00..9.79 rows=138 
width=0)
 Index Cond: ((t2_1.start_time >= '2018-07-01 
00:00:00'::timestamp without time zone) AND (t2_1.start_time < '2018-07-09 

Re: pg11.1 jit segv

2018-11-16 Thread Justin Pryzby
On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> I'm about to commit some changes to 12/master that'd possibly make it
> easier to find issues like this.

Are you referring to this  a future commit ?
commit 763f2edd92095b1ca2f4476da073a28505c13820
Rejigger materializing and fetching a HeapTuple from a slot.

I was able to reproduce under HEAD with pg_restored data.

I guess you're right that the "memory alloc failure" is related/same thing,
I've seen it intermittently with queries which also sometimes crash (and also
sometimes don't).

Note that when it crashes, it seems to take a longer time to do so than the
query would normally take.  Like we're walking off the end of an array, say.

I've been able to reproduce the crash with a self join of a table (no view, no
expressions, no parallel, directly querying a relkind='r' child).  In that
case, enable_bitmapscan=on and jit_tuple_deforming=on are both needed to crash,
and jit_debugging_support=on does not yield a useful bt.

The table is not too special, but was probably ALTERed to add columns a good
number of times by one of our processes.  It has ~1100 columns, including
arrays, and some with null_frac=1.  I'm trying to come up with a test case
involving column types and order.

(gdb) bt
#0  0x7f81a08b8b98 in ?? ()
#1  0x in ?? ()

ts=# SET jit=on;SET jit_above_cost=0;explain(analyze off,verbose off) SELECT 
a.* FROM child.daily_eric_umts_rnc_utrancell_view_201804 a JOIN 
child.daily_eric_umts_rnc_utrancell_view_201804 b USING(start_time,sect_id) 
WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time 
BETWEEN '2018-04-30' AND '2018-05-04';
SET
SET

 QUERY PLAN 
 
-
 Hash Join  (cost=527.36..1038.17 rows=1 width=7760)
   Hash Cond: ((a.start_time = b.start_time) AND (a.sect_id = b.sect_id))
   ->  Bitmap Heap Scan on daily_eric_umts_rnc_utrancell_view_201804 a  
(cost=9.78..515.59 rows=133 width=7760)
 Recheck Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp without 
time zone) AND (start_time <= '2018-05-04 00:00:00'::timestamp without time 
zone))
 ->  Bitmap Index Scan on 
daily_eric_umts_rnc_utrancell_view_201804_unique_idx  (cost=0.00..9.74 rows=133 
width=0)
   Index Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp 
without time zone) AND (start_time <= '2018-05-04 00:00:00'::timestamp without 
time zone))
   ->  Hash  (cost=515.59..515.59 rows=133 width=12)
 ->  Bitmap Heap Scan on daily_eric_umts_rnc_utrancell_view_201804 b  
(cost=9.78..515.59 rows=133 width=12)
   Recheck Cond: ((start_time >= '2018-04-30 00:00:00'::timestamp 
without time zone) AND (start_time <= '2018-05-04 00:00:00'::timestamp without 
time zone))
   ->  Bitmap Index Scan on 
daily_eric_umts_rnc_utrancell_view_201804_unique_idx  (cost=0.00..9.74 rows=133 
width=0)
 Index Cond: ((start_time >= '2018-04-30 
00:00:00'::timestamp without time zone) AND (start_time <= '2018-05-04 
00:00:00'::timestamp without time zone))
 JIT:
   Functions: 19
   Options: Inlining false, Optimization false, Expressions true, Deforming true

BTW find attached patch which I believe corrects some comments.

Justin
diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 59e38d2..ab0c6d0 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -93,7 +93,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	funcname = llvm_expand_funcname(context, "deform");
 
 	/*
-	 * Check which columns do have to exist, so we don't have to check the
+	 * Check which columns have to exist, so we don't have to check the
 	 * rows natts unnecessarily.
 	 */
 	for (attnum = 0; attnum < desc->natts; attnum++)
@@ -252,7 +252,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	}
 
 	/*
-	 * Check if's guaranteed the all the desired attributes are available in
+	 * Check if it's guaranteed that all the desired attributes are available in
 	 * tuple. If so, we can start deforming. If not, need to make sure to
 	 * fetch the missing columns.
 	 */
@@ -337,7 +337,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 
 		/*
 		 * If this is the first attribute, slot->tts_nvalid was 0. Therefore
-		 * reset offset to 0 to, it be from a previous execution.
+		 * also reset offset to 0, it may be from a previous execution.
 		 */
 		if (attnum == 0)
 		{
@@ -554,7 +554,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		else if (att->attnotnull && attguaranteedalign && 

Re: pg_ls_tmpdir(); AND, Function for listing archive_status directory

2018-10-01 Thread Justin Pryzby
On Wed, Sep 26, 2018 at 10:36:03PM +0200, Laurenz Albe wrote:
> Bossart, Nathan wrote:
> > Attached is a patch to add a pg_ls_tmpdir() function that lists the
> > contents of a specified tablespace's pgsql_tmp directory.  This is
> > very similar to the existing pg_ls_logdir() and pg_ls_waldir()
> > functions.

On Sun, Sep 30, 2018 at 10:59:20PM +0200, Christoph Moench-Tegeder wrote:
> while setting up monitoring for a new PostgreSQL instance, I noticed that
> there's no build-in way for a pg_monitor role to check the contents of
> the archive_status directory. We got pg_ls_waldir() in 10, but that
> only lists pg_wal - not it's (sic) subdirectory. 

>It may make sense to have a generic function like
>   pg_ls_dir(dirname text, tablespace oid)
>instead.  But maybe that's taking it too far...

I see Cristoph has another pg_ls_* function, which suggests that Laurenz idea
is good, and a generic pg_ls function is desirable.

Justin



Re: buildfarm and git pull

2018-10-01 Thread Justin Pryzby
On Thu, Sep 27, 2018 at 06:32:59PM +0300, Alexander Kuzmenkov wrote:
> It just has to checkout the remote branch as-is.

It doesn't clean files, but I would suggest:
git checkout -B branch remote/branch

Justin



fine tune v11 release notes

2018-10-06 Thread Justin Pryzby
Find below various fixes to release notes for v11, for discussion purposes.

Note: I changed these:
 "B-tree indexes can now be built in parallel with" 
 "Allow btree indexes to be built in parallel"
..since they could be construed to mean building multiple indexes
simultaneously (perhaps like what's done by pg_restore --jobs or pg_repack).

Conversely, in create_index.sgml, change pre-existing phase "in parallel" to
"simultaneously", since "parallel" is now used to refer to parallel build of a
(single) index.

I think this needs to be rewritten, maybe in a separate commit, but I'm having
trouble writing something less impenetrable:
doc/src/sgml/ref/alter_index.sgml
 ATTACH PARTITION
 
  
   Causes the named index to become attached to the altered index.
   The named index must be on a partition of the table containing the
   index being altered, and have an equivalent definition.  An attached


Finally: should there be an entry for this?
fb466d7b Fully enforce uniqueness of constraint names.  (Tom Lane)
"Previously, multiple constraints could be created with same name.  A 
new
unique index on pg_constraint now avoids that situation."

That was never intended to be allowed..but could still be a pg_upgrade hazard,
which would mean disruptive downtime scheduled and executed, but failure late
in pg_upgrade due to unique violation.  Maybe it'd be worth adding a check
early in pg_upgrade to detect that and exit(1) before stopping the old cluster.
I'm drawing a comparison with inconsistent nullability across inheritence
heirarchy, as discussed at [0].  That affected pg_upgrades at several of our
customers (in our case, not due to adding primary keys).  The last 25% of this
thread is what's relevant.  It looks like that check was never implemented
though.

[0] 
https://www.postgresql.org/message-id/flat/CACQjQLoMsE%2B1pyLe98pi0KvPG2jQQ94LWJ%2BPTiLgVRK4B%3Di_jg%40mail.gmail.com#015ff524738be6cb9a2bd6875c87094a


diff --git a/doc/src/sgml/ref/create_index.sgml 
b/doc/src/sgml/ref/create_index.sgml
index 3c1223b..7bd1da3 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -612,10 +612,10 @@ Indexes:
 

 Regular index builds permit other regular index builds on the
-same table to occur in parallel, but only one concurrent index build
-can occur on a table at a time.  In both cases, no other types of schema
-modification on the table are allowed meanwhile.  Another difference
-is that a regular CREATE INDEX command can be performed 
within
+same table to occur simultaneously, but only one concurrent index build
+can occur on a table at a time.  In either case, schema modification of the
+table is not allowed while the index is being built.  Another difference is
+that a regular CREATE INDEX command can be performed 
within
 a transaction block, but CREATE INDEX CONCURRENTLY 
cannot.

   
@@ -650,7 +650,7 @@ Indexes:
ordering. For example, we might want to sort a complex-number data
type either by absolute value or by real part. We could do this by
defining two operator classes for the data type and then selecting
-   the proper class when making an index.  More information about
+   the proper class when creating an index.  More information about
operator classes is in  and in .
   
@@ -673,7 +673,8 @@ Indexes:
valid, once all partitions acquire the index.)  Note, however, that
any partition that is created in the future using
CREATE TABLE ... PARTITION OF will automatically
-   contain the index regardless of whether this option was specified.
+   contain the index regardless of whether ONLY was
+   specified.
   
 
   
diff --git a/doc/src/sgml/release-11.sgml b/doc/src/sgml/release-11.sgml
index ca42f28..a945502 100644
--- a/doc/src/sgml/release-11.sgml
+++ b/doc/src/sgml/release-11.sgml
@@ -32,25 +32,25 @@

 
  UPDATE statements that change a partition key
- now move affected rows to the appropriate partitions
+ now cause affected rows to be moved to the appropriate partitions
 


 
  Improved SELECT performance from enhanced partition
- elimination strategies during query processing and execution
+ elimination strategies during query planning and execution
 


 
- Support for PRIMARY KEY, FOREIGN
- KEY, indexes, and triggers on partitioned tables
+ Added support on partitioned tables for PRIMARY
+ KEY, FOREIGN KEY, indexes, and triggers
 


 
- Having a "default" partition for storing data that does not match any
- of the remaining partitions
+Allow creation of a "default" partition for storing rows which are
+ excluded by bounds of all other partitions.
 

   
@@ -63,7 +63,7 @@
   

 
-  

Re: Segfault when creating partition with a primary key and sql_drop trigger exists

2018-09-24 Thread Justin Pryzby
On Thu, Sep 20, 2018 at 12:00:18PM +0200, Marco Slot wrote:
> We're seeing a segmentation fault when creating a partition of a
> partitioned table with a primary key when there is a sql_drop trigger on
> Postgres 11beta4.

Thanks for reporting ; I reproduced easily so added to open items list, since
indices on partitioned talbes is a feature new in PG11.

Core was generated by `postgres: pryzbyj ts [local] CREATE TABLE '.
Program terminated with signal 11, Segmentation fault.
#0  0x0059d186 in EventTriggerAlterTableRelid (objectId=40108800) at 
event_trigger.c:1745
1745event_trigger.c: No such file or directory.
in event_trigger.c

(gdb) bt
#0  0x0059d186 in EventTriggerAlterTableRelid (objectId=40108800) at 
event_trigger.c:1745
#1  0x005dfbd3 in AlterTableInternal (relid=40108800, cmds=0x21c39a8, 
recurse=true) at tablecmds.c:3328
#2  0x005b5b7b in DefineIndex (relationId=40108800, stmt=0x21a7350, 
indexRelationId=0, parentIndexId=40084714, parentConstraintId=40084715,
is_alter_table=false, check_rights=false, check_not_in_use=false, 
skip_build=false, quiet=false) at indexcmds.c:669
#3  0x005dcfee in DefineRelation (stmt=0x2116690, relkind=114 'r', 
ownerId=17609, typaddress=0x0,
queryString=0x20f1e60 "CREATE TABLE collections_list_1\nPARTITION OF 
collections_list (key, ts, collection_id, value)\nFOR VALUES IN (1);") at 
tablecmds.c:946
[...]

Justin



Re: pg11.1 jit segv

2018-11-16 Thread Justin Pryzby
On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> Are you referring to this  a future commit ?
this OR a future commit

> The table is not too special, but was probably ALTERed to add columns a good
> number of times by one of our processes.  It has ~1100 columns, including
> arrays, and some with null_frac=1.  I'm trying to come up with a test case
> involving column types and order.

I don't have a failing test case yet but here's what the columns look like:

ts=# SELECT attnum, null_frac, atttypid::regtype, attnotnull, attname, attalign 
, attstorage, attbyval, attlen, attislocal FROM pg_stats s JOIN pg_attribute a 
USING(attname) JOIN pg_class c ON s.tablename=c.relname WHERE c.oid=a.attrelid 
AND tablename='daily_eric_umts_rnc_utrancell_view_201804' AND (attnotnull OR 
null_frac>0.9 OR atttypid::regtype::text LIKE '%[]' OR NOT 
atttypid::regtype::text~'int$|integer|double|numeric' OR attlen=-1 OR NOT 
attbyval OR atthasmissing OR attisdropped OR attnum BETWEEN 80 AND 99) ORDER BY 
1;

 attnum | null_frac |  atttypid   | attnotnull |
 attname | attalign | attstorage | attbyval | attlen | 
attislocal
+---+-++-+--++--++
  1 | 0 | timestamp without time zone | t  | start_time 
 | d| p  | t|  8 | t
  2 | 0 | integer | t  | site_id
 | i| p  | t|  4 | t
  3 | 0 | integer | t  | sect_id
 | i| p  | t|  4 | t
  4 | 0 | integer | t  | rnc_id 
 | i| p  | t|  4 | t
  5 | 0 | text| t  | utrancell  
 | i| x  | f| -1 | t
 30 | 1 | bigint  | f  | 
dl_alt_chcode_alloc | d| p  | t 
   |  8 | t
 31 | 1 | integer | f  | 
dl_alt_chcode_alloc_min | i| p  | t 
   |  4 | t
 32 | 1 | integer | f  | 
dl_alt_chcode_alloc_max | i| p  | t 
   |  4 | t
 45 | 0 | integer[]   | f  | 
dch_ul_rlc_user_tput_samples| i| x  | f 
   | -1 | t
 46 | 0 | integer[]   | f  | 
dch_ul_rlc_user_tput_samples_min| i| x  | f 
   | -1 | t
 47 | 0 | integer[]   | f  | 
dch_ul_rlc_user_tput_samples_max| i| x  | f 
   | -1 | t
 51 | 0 | numeric | f  | 
ps_int_sum_latency_2| i| m  | f 
   | -1 | t
 69 | 0 | numeric | f  | 
mbytes_ul_srb_only_eul  | i| m  | f 
   | -1 | t
[...]
 87 | 0 | numeric | f  | 
mbytes_dl_active_cs57   | i| m  | f 
   | -1 | t
 88 | 0 | numeric | f  | 
mbytes_dl_active_cs57_min   | i| m  | f 
   | -1 | t
 89 | 0 | numeric | f  | 
mbytes_dl_active_cs57_max   | i| m  | f 
   | -1 | t

If I query for cs57, it doesen't crash (in 500ms), but if I query for the next
column, cs57_min, it does (in 18000ms).

Here's a new error message instead of a crash this time:
ts=# SET jit=on;SET jit_above_cost=0;explain(analyze on,verbose off) SELECT 
b.mbytes_dl_active_cs57_min FROM 
child.daily_eric_umts_rnc_utrancell_view_201804 a JOIN 
child.daily_eric_umts_rnc_utrancell_view_201804 b USING(start_time,sect_id) 
WHERE a.start_time BETWEEN '2018-04-30' AND '2018-05-04' AND b.start_time 
BETWEEN '2018-04-30' AND '2018-05-04';
SET
SET
ERROR:  out of memory
DETAIL:  Failed on request of size 425170160 in memory context 
"HashBatchContext".

Here's verbose output you requested, sans expressions:

ts=# SET jit=on;SET jit_above_cost=0;explain(analyze off,verbose) SELECT 
b.mbytes_dl_active_cs57_min FROM 
child.daily_eric_umts_rnc_utrancell_view_201804 a JOIN 
child.daily_eric_umts_rnc_utrancell_view_201804 b USING(start_time,sect_id) 
WHERE a.start_time BETWEEN 

Re: pg11.1 jit segv

2018-11-16 Thread Justin Pryzby
On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> BTW find attached patch which I believe corrects some comments.

Updated.  Some of the changes may be gratuitous, but changed while I was
already looking.

Also note that I had to remove -flto=thin to compile under RH7.

Justin
diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 938dfc7..0663719 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -98,7 +98,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	funcname = llvm_expand_funcname(context, "deform");
 
 	/*
-	 * Check which columns do have to exist, so we don't have to check the
+	 * Check which columns have to exist, so we don't have to check the
 	 * rows natts unnecessarily.
 	 */
 	for (attnum = 0; attnum < desc->natts; attnum++)
@@ -257,7 +257,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	}
 
 	/*
-	 * Check if's guaranteed the all the desired attributes are available in
+	 * Check if it's guaranteed that all the desired attributes are available in
 	 * tuple. If so, we can start deforming. If not, need to make sure to
 	 * fetch the missing columns.
 	 */
@@ -342,7 +342,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 		/*
 		 * If this is the first attribute, slot->tts_nvalid was 0. Therefore
-		 * reset offset to 0 to, it be from a previous execution.
+		 * also reset offset to 0, it may be from a previous execution.
 		 */
 		if (attnum == 0)
 		{
@@ -372,7 +372,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 		/*
 		 * Check for nulls if necessary. No need to take missing attributes
-		 * into account, because in case they're present the heaptuple's natts
+		 * into account, because if they're present, the heaptuple's natts
 		 * would have indicated that a slot_getmissingattrs() is needed.
 		 */
 		if (!att->attnotnull)
@@ -459,13 +459,13 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 			(known_alignment < 0 || known_alignment != TYPEALIGN(alignto, known_alignment)))
 		{
 			/*
-			 * When accessing a varlena field we have to "peek" to see if we
+			 * When accessing a varlena field, we have to "peek" to see if we
 			 * are looking at a pad byte or the first byte of a 1-byte-header
 			 * datum.  A zero byte must be either a pad byte, or the first
-			 * byte of a correctly aligned 4-byte length word; in either case
+			 * byte of a correctly aligned 4-byte length word; in either case,
 			 * we can align safely.  A non-zero byte must be either a 1-byte
 			 * length word, or the first byte of a correctly aligned 4-byte
-			 * length word; in either case we need not align.
+			 * length word; in either case, we need not align.
 			 */
 			if (att->attlen == -1)
 			{
@@ -559,8 +559,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 		else if (att->attnotnull && attguaranteedalign && known_alignment >= 0)
 		{
 			/*
-			 * If the offset to the column was previously known a NOT NULL &
-			 * fixed width column guarantees that alignment is just the
+			 * If the offset to the column was previously known, a NOT NULL &
+			 * fixed-width column guarantees that alignment is just the
 			 * previous alignment plus column width.
 			 */
 			Assert(att->attlen > 0);
@@ -601,8 +601,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	   LLVMBuildGEP(b, v_tts_nulls, _attno, 1, ""));
 
 		/*
-		 * Store datum. For byval datums copy the value, extend to Datum's
-		 * width, and store. For byref types, store pointer to data.
+		 * Store datum. For byval datums: copy the value, extend to Datum's
+		 * width, and store. For byref types: store pointer to data.
 		 */
 		if (att->attbyval)
 		{


pg11.1: dsa_area could not attach to segment

2018-12-31 Thread Justin Pryzby
In our query logs I saw:

postgres=# SELECT log_time, session_id, session_line, left(message,99), 
left(query,99) FROM postgres_log WHERE error_severity='ERROR' AND message NOT 
LIKE 'cancel%';
-[ RECORD 1 
]+
log_time | 2018-12-31 15:39:11.917-05
session_id   | 5c2a7e6f.1fa4
session_line | 1
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text 
ORDER BY colpar.attnum) cols, array
-[ RECORD 2 
]+
log_time | 2018-12-31 15:39:11.917-05
session_id   | 5c2a7e6f.1fa3
session_line | 4
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text 
ORDER BY colpar.attnum) cols, array

The full query + plan is:

|ts=# explain SELECT colcld.child c, parent p, array_agg(colpar.attname::text 
ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, 
colpar.atttypmod) ORDER BY colpar.attnum) AS types
|FROM queued_alters qa
|JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND 
colpar.attnum>0 AND NOT colpar.attisdropped
|JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON 
to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT 
colcld.attisdropped
|WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
|GROUP BY 1,2
|ORDER BY
|parent LIKE 'unused%', -- Do them last
|regexp_replace(colcld.child, 
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC,
|regexp_replace(colcld.child, '.*_', '') DESC
|LIMIT 1;

|QUERY PLAN
|Limit  (cost=67128.06..67128.06 rows=1 width=307)
|  ->  Sort  (cost=67128.06..67137.84 rows=3912 width=307)
|Sort Key: (((qa_1.parent)::text ~~ 'unused%'::text)), 
(regexp_replacepg_attribute.attrelid)::regclass)::text), 
'.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$'::text, 
'\3\5'::text)) DESC, 
(regexp_replacepg_attribute.attrelid)::regclass)::text), '.*_'::text, 
''::text)) DESC
|->  GroupAggregate  (cost=66893.34..67108.50 rows=3912 width=307)
|  Group Key: (((pg_attribute.attrelid)::regclass)::text), 
qa_1.parent
|  ->  Sort  (cost=66893.34..66903.12 rows=3912 width=256)
|Sort Key: (((pg_attribute.attrelid)::regclass)::text), 
qa_1.parent
|->  Gather  (cost=40582.28..66659.91 rows=3912 width=256)
|  Workers Planned: 2
|  ->  Parallel Hash Join  (cost=39582.28..65268.71 
rows=1630 width=256)
|Hash Cond: 
(((to_regclass((qa_1.child)::text))::oid = pg_attribute.attrelid) AND 
(colpar.attname = pg_attribute.attname))
|Join Filter: (colpar.atttypid <> 
pg_attribute.atttypid)
|->  Nested Loop  (cost=0.43..25614.89 
rows=11873 width=366)
|  ->  Parallel Append  (cost=0.00..12.00 
rows=105 width=292)
|->  Parallel Seq Scan on 
queued_alters_child qa_1  (cost=0.00..11.47 rows=147 width=292)
|->  Parallel Seq Scan on 
queued_alters qa  (cost=0.00..0.00 rows=1 width=292)
|  ->  Index Scan using 
pg_attribute_relid_attnum_index on pg_attribute colpar  (cost=0.43..242.70 
rows=114 width=78)
|Index Cond: ((attrelid = 
(to_regclass((qa_1.parent)::text))::oid) AND (attnum > 0))
|Filter: (NOT attisdropped)
|->  Parallel Hash  (cost=33651.38..33651.38 
rows=395365 width=72)
|  ->  Parallel Seq Scan on pg_attribute  
(cost=0.00..33651.38 rows=395365 width=72)
|Filter: ((NOT attisdropped) AND 
(attnum > 0))
|

queued_alters is usually empty, and looks like it would've last been nonempty 
on 2018-12-10.

ts=# \d queued_alters
  Table "public.queued_alters"
 Column | Type  | Collation | Nullable | Default
+---+---+--+-
 child  | character varying(64) |   |  |
 parent | character varying(64) |   |  |
Indexes:
"queued_alters_child_parent_key" UNIQUE CONSTRAINT, btree (child, parent)
Number of child tables: 1 (Use \d+ to list them.)

I found this other log at that time:
 2018-12-31 15:39:11.918-05 | 30831 | 5bf38e71.786f |5 | background 
worker "parallel worker" (PID 8100) exited with exit code 1

Which is the postmaster, or its PID in any case..

[pryzbyj@telsasoft-db ~]$ ps -wwwf 30831
UIDPID  PPID  C STIME TTY  STAT   

Re: bitmaps and correlation

2019-01-01 Thread Justin Pryzby
Attached for real.
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 88780c0..1c25f36 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -548,11 +548,12 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 
 	/*
 	 * Save amcostestimate's results for possible use in bitmap scan planning.
-	 * We don't bother to save indexStartupCost or indexCorrelation, because a
-	 * bitmap scan doesn't care about either.
+	 * We don't bother to save indexStartupCost, because a
+	 * bitmap scan doesn't care.
 	 */
 	path->indextotalcost = indexTotalCost;
 	path->indexselectivity = indexSelectivity;
+	path->indexCorrelation = indexCorrelation;
 
 	/* all costs for touching index itself included here */
 	startup_cost += indexStartupCost;
@@ -671,6 +672,7 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		else
 			min_IO_cost = 0;
 	}
+elog(DEBUG4, "index_pages_fetched2 fetched:%d", (int)pages_fetched);
 
 	if (partial_path)
 	{
@@ -711,6 +713,9 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	csquared = indexCorrelation * indexCorrelation;
 
 	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+	Cost heapCost=max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+	elog(DEBUG4, "index_cost1 startup:%f heap:%f run:%f idxtotal:%f total:%f",
+		indexStartupCost, heapCost, run_cost, indexTotalCost, path->path.total_cost);
 
 	/*
 	 * Estimate CPU costs per tuple.
@@ -744,6 +749,9 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 
 	path->path.startup_cost = startup_cost;
 	path->path.total_cost = startup_cost + run_cost;
+
+	elog(DEBUG4, "index_cost1 startup:%f heap:%f run:%f idxtotal:%f total:%f",
+		indexStartupCost, heapCost, run_cost, indexTotalCost, path->path.total_cost);
 }
 
 /*
@@ -876,6 +884,7 @@ index_pages_fetched(double tuples_fetched, BlockNumber pages,
 		}
 		pages_fetched = ceil(pages_fetched);
 	}
+elog(DEBUG4, "index_pages_fetched T=pages=%d total_pages:%d ECC:%d bECC:%d fetched:%d", (int)T, (int)total_pages, (int)effective_cache_size, (int)b, (int)pages_fetched);
 	return pages_fetched;
 }
 
@@ -888,6 +897,7 @@ index_pages_fetched(double tuples_fetched, BlockNumber pages,
  * not completely clear, and detecting duplicates is difficult, so ignore it
  * for now.
  */
+
 static double
 get_indexpath_pages(Path *bitmapqual)
 {
@@ -925,6 +935,66 @@ get_indexpath_pages(Path *bitmapqual)
 }
 
 /*
+ * get_indexpath_correlation
+ * Recursively determine correlation of an bitmap path by XXX:min/max of correlation of indices scanned.
+ */
+static double
+UNUSED_get_indexpath_correlation(PlannerInfo *root, Path *bitmapqual)
+{
+	double		x=0, result = 0;
+	int			loop = 0;
+	ListCell   *l;
+
+	if (IsA(bitmapqual, IndexPath))
+	{
+		IndexPath  *ipath = (IndexPath *) bitmapqual;
+		amcostestimate_function amcostestimate = ipath->indexinfo->amcostestimate;
+		double junk;
+		Cost junk2;
+		Selectivity junk3;
+		amcostestimate(root, ipath, 1/*loop_count*/,
+/**/, /**/,
+/**/, ,
+/*_pages*/);
+		// result = 0.1*x*x;
+		// result = fabs(x);
+		result = (x*x + fabs(x))/2;
+		elog(DEBUG4, "get_indexpath_correlation %f", result);
+	/* The correlation values below here will be between [0,1] and not further squared */
+	} else if (IsA(bitmapqual, BitmapAndPath)) {
+		BitmapAndPath *apath = (BitmapAndPath *) bitmapqual;
+
+		foreach(l, apath->bitmapquals)
+		{
+			x = UNUSED_get_indexpath_correlation(root, (Path *) lfirst(l));
+			// XXX: should use the correlation of the most-selective path..
+			if (loop) result = Max(result, x);
+			else result = x;
+			loop=1;
+		}
+	}
+	else if (IsA(bitmapqual, BitmapOrPath))
+	{
+		BitmapOrPath *opath = (BitmapOrPath *) bitmapqual;
+
+		foreach(l, opath->bitmapquals)
+		{
+			/* Min is probably not right: should use the correlation of the
+			 * least-selective path */
+			x = UNUSED_get_indexpath_correlation(root, (Path *) lfirst(l));
+			if (loop) result = Min(result, x);
+			else result = x;
+			loop=1;
+		}
+	}
+
+	else
+		elog(ERROR, "unrecognized node type: %d", nodeTag(bitmapqual));
+
+	return result;
+}
+
+/*
  * cost_bitmap_heap_scan
  *	  Determines and returns the cost of scanning a relation using a bitmap
  *	  index-then-heap plan.
@@ -954,7 +1024,6 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 	double		pages_fetched;
 	double		spc_seq_page_cost,
 spc_random_page_cost;
-	double		T;
 
 	/* Should only be applied to base relations */
 	Assert(IsA(baserel, RelOptInfo));
@@ -975,7 +1044,6 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 		 _fetched);
 
 	startup_cost += indexTotalCost;
-	T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;
 
 	/* Fetch estimated page costs for tablespace containing table. */
 	get_tablespace_page_costs(baserel->reltablespace,
@@ -988,12 +1056,35 @@ cost_bitmap_heap_scan(Path 

bitmaps and correlation

2019-01-01 Thread Justin Pryzby
It's a new year and I'm getting reflective, so resuming a portion of
conversation we had here:
https://www.postgresql.org/message-id/CAMkU%3D1yVbwEAugaCmKWxjaX15ZduWee45%2B_DqCw--d_3N_O_%3DQ%40mail.gmail.com

Find attached patch which implements use of correlation statistic in costing
for bitmap scans.

An opened question in my mind is how to combine the correlation statistic with
existing cost_per_page:

 . sqrt(a^2+b^2)
 . MIN()
 . multiply existing computation by new correlation component

On Wed, Dec 20, 2017 at 09:55:40PM -0800, Jeff Janes wrote:
> On Tue, Dec 19, 2017 at 7:25 PM, Justin Pryzby  wrote:
> 
> > I started playing with this weeks ago (probably during Vitaliy's problem
> > report).  Is there any reason cost_bitmap_heap_scan shouldn't interpolate
> > based on correlation from seq_page_cost to rand_page_cost, same as
> > cost_index ?
> 
> I think that doing something like that is a good idea in general, but someone
> has to implement the code, and so far no one seems enthused to do so.  You
> seem pretty interested in the topic, so

I tested patch using CDR data which was causing issues for us years ago:
https://www.postgresql.org/message-id/20160524173914.GA11880%40telsasoft.com

Note: since that original problem report:
 . the SAN is backed by SSD rather than rotational storage;
 . we're using relkind=p partitioned tables;
 . PG12 uses pread() rather than lstat()+read(), so overhead of seek()+read()
  is avoided (but probably wasn't a substantial component of the problem);

Unpatched, I'm unable to get bitmap scan without disabling indexscan and 
seqscan.
| Bitmap Heap Scan on cdrs_huawei_pgwrecord_2018_12_25  
(cost=55764.07..1974230.46 rows=2295379 width=1375)
|   Recheck Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp 
with time zone) AND (recordopeningtime <= '2018-12-25 10:00:00-06'::timestamp 
with time zone))
|   ->  Bitmap Index Scan on 
cdrs_huawei_pgwrecord_2018_12_25_recordopeningtime_idx  (cost=0.00..55190.22 
rows=2295379 width=0)
| Index Cond: ((recordopeningtime >= '2018-12-25 
05:00:00-06'::timestamp with time zone) AND (recordopeningtime <= '2018-12-25 
10:00:00-06'::timestamp with time zone))

Patched, I get bitmap scan when random_page_cost is reduced enough that startup
cost (index scan component) is not prohibitive.  But for simplicity, this just
forces bitmap by setting enable_indexscan=off;
| Bitmap Heap Scan on cdrs_huawei_pgwrecord_2018_12_25  
(cost=55764.07..527057.45 rows=2295379 width=1375)
|   Recheck Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp 
with time zone) AND (recordopeningtime <= '2018-12-25 10:00:00-06'::timestamp 
with time zone))
|   ->  Bitmap Index Scan on 
cdrs_huawei_pgwrecord_2018_12_25_recordopeningtime_idx  (cost=0.00..55190.22 
rows=2295379 width=0)
| Index Cond: ((recordopeningtime >= '2018-12-25 
05:00:00-06'::timestamp with time zone) AND (recordopeningtime <= '2018-12-25 
10:00:00-06'::timestamp with time zone))

That addresses the issue we had in part.  A remaining problem is that
correlation fails to distinguish between "fresh" index and fragmented index,
and so heap access of a correlated index may looks cheaper than it is.  Which
is why I still have to set random_page_cost to get bitmap scan.

Justin



ALTER INDEX fails on partitioned index

2019-01-05 Thread Justin Pryzby
12dev and 11.1:

postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i);
postgres=# CREATE INDEX ON t(i) WITH(fillfactor=11);
postgres=# ALTER INDEX t_i_idx SET (fillfactor=12);
ERROR:  42809: "t_i_idx" is not a table, view, materialized view, or index
LOCATION:  ATWrongRelkindError, tablecmds.c:5031

I can't see that's deliberate, but I found an earlier problem report; however,
discussion regarding the ALTER behavior seems to have been eclipsed due to 2nd,
separate issue with pageinspect.

https://www.postgresql.org/message-id/flat/CAKcux6mb6AZjMVyohnta6M%2BfdkUB720Gq8Wb6KPZ24FPDs7qzg%40mail.gmail.com

Justin



Re: unique, partitioned index fails to distinguish index key from INCLUDEd columns

2019-01-14 Thread Justin Pryzby
On Mon, Jan 14, 2019 at 07:31:07PM -0300, Alvaro Herrera wrote:
> On 2019-Jan-09, Justin Pryzby wrote:
> 
> > -- Fails to error
> > postgres=# CREATE UNIQUE INDEX ON t(j) INCLUDE(i);
> > 
> > -- Fail to enforce uniqueness across partitions due to failure to enforce 
> > inclusion of partition key in index KEY
> > postgres=# INSERT INTO t VALUES(1,1);
> > postgres=# INSERT INTO t VALUES(2,1); 
> 
> Doh.  Fix pushed.  Commit 8224de4f42cc should have changed one
> appearance of ii_NumIndexAttrs to ii_NumIndexKeyAttrs, but because of
> the nature of concurrent development, nobody noticed.

I figured as much - I thought to test this while trying to fall asleep,
without knowing they were developed in parallel.

Should backpatch to v11 ?
0ad41cf537ea5f076273fcffa4c83a184bd9910f

Thanks,
Justin



psql exit status with multiple -c or -f

2018-12-17 Thread Justin Pryzby
Our deployment script failed to notice dozens of commands failed in a
transaction block and I only noticed due to keeping full logs and monitoring
for error_severity>'LOG'.  I would have thought that exit status would be
nonzero had an error occured in an earlier script.

The docs since PG9.6 say:
https://www.postgresql.org/docs/9.6/app-psql.html
|Exit Status
|
|psql returns 0 to the shell if it finished normally, 1 if a fatal error of its
|own occurs (e.g. out of memory, file not found), 2 if the connection to the
|server went bad and the session was not interactive, and 3 if an error occurred
|in a script and the variable ON_ERROR_STOP was set.

d5563d7df94488bf0ab52ac0678e8a07e5b8297e
psql: Support multiple -c and -f options, and allow mixing them.

If there's an error, it returns 1 (although that's not "a fatal error of its
own").

|[pryzbyj@database ~]$ psql ts -c foo 2>/dev/null ; echo $?
|1

But the error is "lost" if another script or -c runs without failing:

|[pryzbyj@database ~]$ psql ts -txqc foo -c SELECT 2>/dev/null ; echo $?
|0

Note, this works as expected:

|[pryzbyj@database ~]$ psql ts -v ON_ERROR_STOP=1 -txqc foo -f /dev/null 
2>/dev/null ; echo $?
|1

Justin



Re: psql exit status with multiple -c or -f

2018-12-18 Thread Justin Pryzby
On Tue, Dec 18, 2018 at 05:13:40PM +0900, Kyotaro HORIGUCHI wrote:
> $  psql postgres -v ON_ERROR_STOP=0 -f ~/work/y.txt ; echo $?
> $  psql postgres -v ON_ERROR_STOP=0 < ~/work/y.txt ; echo $?

> c) psql postgres -v ON_ERROR_STOP=0 -c foo -c 'select 1'; echo $?
> d) psql postgres -v ON_ERROR_STOP=0 -c 'select 1' -c foo; echo $?
> 
> (c) returns 0 and (d) returns 1, but both return 1 when
> ON_ERROR_STOP=1.

> The attached second patch lets (c) and (d) behave the same as (a)
> and (b).

I don't think the behavior in the single command case should be changed:

|[pryzbyj@database postgresql]$ ./src/bin/psql/psql ts -c FOO 2>/dev/null ; 
echo $? 
|1
|[pryzbyj@database postgresql]$ patch -p1 
/dev/null
|[pryzbyj@database postgresql]$ ./src/bin/psql/psql ts -c FOO 2>/dev/null ; 
echo $? 
|0

Also, unpatched v11 psql returns status of last command
|[pryzbyj@database postgresql]$ psql ts -xtc 'SELECT 1' -c FOO 2>/dev/null ; 
echo $? 
|?column? | 1
|
|1

patched psql returns 0:
|[pryzbyj@database postgresql]$ ./src/bin/psql/psql ts -xtc 'SELECT 1' -c FOO 
2>/dev/null ; echo $? 
|?column? | 1
|
|0

I'd prefer to see the exit status of the -f scripts (your cases a and b)
changed to 3 if the last command failed.  I realized that's not what the
documentation says so possibly not backpatchable.
|3 if an error occurred in a script and the variable ON_ERROR_STOP was set.

Think of:

$ cat /tmp/sql 
begin;
foo;
select 1;

$ psql ts -f /tmp/sql ; echo ret=$?
BEGIN
psql:/tmp/sql:2: ERROR:  syntax error at or near "foo"
LINE 1: foo;
^
psql:/tmp/sql:3: ERROR:  current transaction is aborted, commands ignored until 
end of transaction block
ret=0

I think ON_ERROR_STOP would control whether the script stops, but it should
fail the exit status should reflect any error in the last command.  The shell
does that even without set -e.

Justin



jit comments typos (Re: pg11.1 jit segv)

2018-11-27 Thread Justin Pryzby
On Tue, Nov 27, 2018 at 10:24:52AM -0800, Andres Freund wrote:
> And pushed. Justin, thanks again for reporting the bug and then
> narrowing it down to a reproducible test case! Would've been much harder
> to diagnose without that.
> 
> I'll look into your comments patch in a bit.

Thanks for implementing and patching it :)

And thanks for remembering the patch, and reminding me.

Here's an updated copy with additional hunks.

Justin
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index ec4a250..83e4e05 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -1873,7 +1873,7 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot)
 
 	/*
 	 * Should probably fixed at some point, but for now it's easier to allow
-	 * buffer and heap tuples to be used interchangably.
+	 * buffer and heap tuples to be used interchangeably.
 	 */
 	if (slot->tts_ops ==  &&
 		op->d.fetch.kind == )
diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 4111bf0..ba238f1 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -103,7 +103,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	funcname = llvm_expand_funcname(context, "deform");
 
 	/*
-	 * Check which columns do have to exist, so we don't have to check the
+	 * Check which columns have to exist, so we don't have to check the
 	 * rows natts unnecessarily.
 	 */
 	for (attnum = 0; attnum < desc->natts; attnum++)
@@ -292,7 +292,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	}
 
 	/*
-	 * Check if's guaranteed the all the desired attributes are available in
+	 * Check if it's guaranteed that all the desired attributes are available in
 	 * tuple. If so, we can start deforming. If not, need to make sure to
 	 * fetch the missing columns.
 	 */
@@ -377,7 +377,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 		/*
 		 * If this is the first attribute, slot->tts_nvalid was 0. Therefore
-		 * reset offset to 0 to, it be from a previous execution.
+		 * also reset offset to 0, it may be from a previous execution.
 		 */
 		if (attnum == 0)
 		{
@@ -407,7 +407,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 		/*
 		 * Check for nulls if necessary. No need to take missing attributes
-		 * into account, because in case they're present the heaptuple's natts
+		 * into account, because if they're present, the heaptuple's natts
 		 * would have indicated that a slot_getmissingattrs() is needed.
 		 */
 		if (!att->attnotnull)
@@ -494,13 +494,13 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 			(known_alignment < 0 || known_alignment != TYPEALIGN(alignto, known_alignment)))
 		{
 			/*
-			 * When accessing a varlena field we have to "peek" to see if we
+			 * When accessing a varlena field, we have to "peek" to see if we
 			 * are looking at a pad byte or the first byte of a 1-byte-header
 			 * datum.  A zero byte must be either a pad byte, or the first
-			 * byte of a correctly aligned 4-byte length word; in either case
+			 * byte of a correctly aligned 4-byte length word; in either case,
 			 * we can align safely.  A non-zero byte must be either a 1-byte
 			 * length word, or the first byte of a correctly aligned 4-byte
-			 * length word; in either case we need not align.
+			 * length word; in either case, we need not align.
 			 */
 			if (att->attlen == -1)
 			{
@@ -594,8 +594,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 		else if (att->attnotnull && attguaranteedalign && known_alignment >= 0)
 		{
 			/*
-			 * If the offset to the column was previously known a NOT NULL &
-			 * fixed width column guarantees that alignment is just the
+			 * If the offset to the column was previously known, a NOT NULL &
+			 * fixed-width column guarantees that alignment is just the
 			 * previous alignment plus column width.
 			 */
 			Assert(att->attlen > 0);
@@ -636,8 +636,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	   LLVMBuildGEP(b, v_tts_nulls, _attno, 1, ""));
 
 		/*
-		 * Store datum. For byval datums copy the value, extend to Datum's
-		 * width, and store. For byref types, store pointer to data.
+		 * Store datum. For byval datums: copy the value, extend to Datum's
+		 * width, and store. For byref types: store pointer to data.
 		 */
 		if (att->attbyval)
 		{
diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index b33a321..2ad29be 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -9,7 +9,7 @@
  * for an external function is found - not guaranteed! - the index will then
  * be used to judge their instruction count / inline worthiness. After doing
  * so for all external functions, all the referenced functions (and
- * prerequisites) will be imorted.
+ * 

Re: pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string

2018-11-27 Thread Justin Pryzby
I'm resending a mail from June:
https://www.postgresql.org/message-id/flat/87sh5doya9.fsf%40news-spur.riddles.org.uk#83c3d1a183217204939252d56804f247

This is maybe a trivial change in ERROR string which maybe worth changing.

On PG10:
|[pryzbyj@database ~]$ psql postgres -c 'DROP DATABASE x; CREATE DATABASE x';
|ERROR:  DROP DATABASE cannot be executed from a function or multi-command 
string

On PG11.1:
|[pryzbyj@telsasoft2015 server]$ psql postgres -c 'DROP DATABASE x; CREATE 
DATABASE x';
|ERROR:  DROP DATABASE cannot run inside a transaction block

I spent at least a few minutes trying to do things like: psql -c 
'begin;commit;create;drop'
..before figuring it out due to misleading error message, despite having
written this earlier message.

On Sat, Jun 23, 2018 at 04:06:37PM -0500, Justin Pryzby wrote:
> in pg10:
> 
>   ts=# begin;VACUUM FULL pg_toast.pg_toast_2619;
>   BEGIN
>   ERROR:  25001: VACUUM cannot run inside a transaction block
>   LOCATION:  PreventTransactionChain, xact.c:3167
> => sounds fine
> 
>   $ psql postgres -c 'SELECT 1; VACUUM pg_statistic'
>   ERROR:  VACUUM cannot be executed from a function or multi-command 
> string
> => sounds fine
> 
> In pg11b1:
> 
>   pryzbyj=# begin;VACUUM FULL pg_toast.pg_toast_2619;
>   BEGIN
>   ERROR:  25001: VACUUM cannot run inside a transaction block
>   LOCATION:  PreventInTransactionBlock, xact.c:3163
> => sounds fine
> 
>   [pryzbyj@dev ~]$ psql -c 'SELECT 1; VACUUM pg_statistic'
>   ERROR:  VACUUM cannot run inside a transaction block
> => Error message seems off??
> 
> I couldn't figure out how to \set VERBOSITY verbose inside a psql command 
> (??),
> but strace shows for v10:
> SERROR\0VERROR\0C25001\0MVACUUM cannot be executed from a function or 
> multi-command string\0Fxact.c\0L3187\0RPreventTransactionChain
> 
> And for v11:
> SERROR\0VERROR\0C25001\0MVACUUM cannot run inside a transaction 
> block\0Fxact.c\0L3163\0RPreventInTransactionBlock
> 
> Function renamed by commit 04700b685f31508036456bea4d92533e5ceee9d6.
> 
> So behavior change maybe caused by 6eb52da3948dc8bc7c8a61cbacac14823b670c58 ?
> 
> Justin



Re: performance statistics monitoring without spamming logs

2018-11-21 Thread Justin Pryzby
Hi,

I'm replying to an old thread from -performance:
https://www.postgresql.org/message-id/flat/7ffb9dbe-c76f-8ca3-12ee-7914ede872e6%40stormcloud9.net

I was looking at:
https://commitfest.postgresql.org/20/1691/
"New GUC to sample log queries"

On Tue, Jul 10, 2018 at 01:54:12PM -0400, Patrick Hemmer wrote:
> I'm looking for a way of gathering performance stats in a more usable
> way than turning on `log_statement_stats` (or other related modules).
> The problem I have with the log_*_stats family of modules is that they
> log every single query, which makes them unusable in production. Aside
> from consuming space, there's also the problem that the log system
> wouldn't be able to keep up with the rate.
> 
> There are a couple ideas that pop into mind that would make these stats
> more usable:
> 1. Only log when the statement would otherwise already be logged. Such
> as due to the `log_statement` or `log_min_duration_statement` settings.

..but instead came back to this parallel thread and concluded that I'm
interested in exactly that behavior: log_statement_stats only if
log_min_duration_statement exceeded.

If there's agreement that's desirable behavior, should I change
log_statement_stats to a GUC (on/off/logged) ?

Or, would it be reasonable to instead change the existing behavior of
log_statement_stats=on to mean what I want: only log statement stats if
statement is otherwise logged.  If log_statement_stats is considered to be a
developer option, most likely to be enabled either in a development or other
segregated or non-production environment, or possibly for only a single session
for diagnostics.

My own use case is that I'd like to know if a longrunning query was doing lots
of analytics (aggregation/sorting), or possibly spinning away on a nested
nested loop.  But I only care about the longest queries, and then probably look
at the ratio of user CPU/clock time.

Justin



Re: pg11.1 jit segv

2018-11-16 Thread Justin Pryzby
On Fri, Nov 16, 2018 at 08:29:27AM -0800, Andres Freund wrote:
> > On Thu, Nov 15, 2018 at 04:17:51PM -0800, Andres Freund wrote:
> > > I'm about to commit some changes to 12/master that'd possibly make it
> commit 15d8f83128e15de97de61430d0b9569f5ebecc26

I don't think it had to do with your commit, but I recompiled HEAD with debug +
casserts and have this to show.

TRAP: unrecognized TOAST vartag("1", File: "heaptuple.c", Line: 1705)

#0  0x7faf5fac9277 in raise () from /lib64/libc.so.6
#1  0x7faf5faca968 in abort () from /lib64/libc.so.6
#2  0x0088e347 in ExceptionalCondition 
(conditionName=conditionName@entry=0xa8a69e "1", 
errorType=errorType@entry=0x8deda8 "unrecognized TOAST vartag", 
fileName=fileName@entry=0x8dff82 "heaptuple.c", 
lineNumber=lineNumber@entry=1705) at assert.c:54
#3  0x00489830 in varsize_any (p=) at heaptuple.c:1705
#4  0x7faf60c98560 in ?? ()
#5  0x in ?? ()
#6  0x0008c21c in ?? ()
#7  0x004897d0 in ?? () at heaptuple.c:1690
#8  0x0008c21c in ?? ()
#9  0x0008c21c in ?? ()
#10 0x021c in ?? ()
#11 0x004897d0 in ?? () at heaptuple.c:1690
[...]



  1   2   3   4   5   6   7   8   9   10   >