How to only auto-restart BGW only on crash or _PG_init

2020-03-24 Thread Jeremy Finzel
Good afternoon!

I would be grateful for some direction on how to use Background workers to
have a worker automatically restart *only* in certain cases, i.e. on
postmaster start (_PG_init) or a soft crash.  I run into all sorts of
trouble if I set bgw_restart_time to actually restart on sigterm, because
in most cases I don't want it to restart (i.e. it was launched with invalid
config, the SQL becomes invalid...).  But I *do* want it to auto-restart in
any kind of crash.  If I set bgw_restart_time to never restart, then it
doesn't restart after a soft crash, which I want.

This is for my extension pglogical_ticker, and specifically within this
main function where a sigterm might happen:
https://github.com/enova/pglogical_ticker/blob/ef9b68fd6b5b99787034520009577f8cfec0049c/pglogical_ticker.c#L85-L201

I have tried several things unsuccessfully (checking result of SPI_execute
or SPI_connect) , usually resulting in a constantly restarting and failing
worker.  So, is there a straightforward way to only have the worker
auto-restart in a very narrow range of cases?

Many thanks!
Jeremy


Re: PHJ file leak.

2020-03-06 Thread Jeremy Finzel
On Fri, Mar 6, 2020 at 9:43 AM Tom Lane  wrote:

> Jeremy Finzel  writes:
> > We are hitting this leak in production on an 11.6 system for a query that
> > is using a parallel hash join.  Was this fix pushed in 11.7? I can't tell
> > clearly from the release notes for 11.7 or this thread.
>
> It looks like you're asking about this commit:
>
> Author: Thomas Munro 
> Branch: master [76cbfcdf3] 2019-11-16 10:11:30 +1300
> Branch: REL_12_STABLE Release: REL_12_2 [24897e1a1] 2019-11-16 10:18:45
> +1300
> Branch: REL_11_STABLE Release: REL_11_7 [bc049d0d4] 2019-11-16 10:19:16
> +1300
>
> Always call ExecShutdownNode() if appropriate.
>
> which is documented thus in the 11.7 release notes:
>
> 
> 
>  
>   Ensure parallel plans are always shut down at the correct time
>   (Kyotaro Horiguchi)
>  
>
>  
>   This oversight is known to result in temporary file
>   leak warnings from multi-batch parallel hash joins.
>  
> 
>
>
> regards, tom lane
>

Thank you!  Yep, pretty clear :).


Re: PHJ file leak.

2020-03-06 Thread Jeremy Finzel
On Fri, Nov 15, 2019 at 3:32 PM Thomas Munro  wrote:

> On Wed, Nov 13, 2019 at 9:52 PM Amit Kapila 
> wrote:
> > On Wed, Nov 13, 2019 at 6:13 AM Kyotaro Horiguchi <
> horikyota@gmail.com> wrote:
> > > The phatch's shape looks better. Thanks.
> >
> > +1.  LGTM as well.
>
> Thanks.  Pushed.
>
>
>
We are hitting this leak in production on an 11.6 system for a query that
is using a parallel hash join.  Was this fix pushed in 11.7? I can't tell
clearly from the release notes for 11.7 or this thread.

Thanks!
Jeremy


Re: Index corruption / planner issue with one table in my pg 11.6 instance

2019-12-12 Thread Jeremy Finzel
On Tue, Dec 10, 2019 at 8:25 AM Jeremy Finzel  wrote:

> Is there a way for me to test this theory?  I tried the following with no
> change in behavior:
>
>1. Disable write load to table
>2. Vacuum analyze table (not vac full)
>3. Create index
>4. Explain
>
> Still did not pick up the index.
>

Just another followup: with no other intervention on our part, after many
hours the planner is picking up the index.

I don't quite know what is causing it still, but is this behavior actually
desired?  It's pretty inconvenient when trying to build an index for a
query need and immediately use it which used to work :).

Thanks,
Jeremy


Re: Index corruption / planner issue with one table in my pg 11.6 instance

2019-12-10 Thread Jeremy Finzel
On Tue, Dec 10, 2019 at 12:09 AM Tom Lane  wrote:

> Yeah.  The reported behavior can mostly be explained if we assume
> that there's some HOT chain in the table that involves an update
> of this particular column, so that if we build an index on that
> column we see a broken HOT chain, but building an index on some
> other column doesn't have a problem.
>

The problem exists so far as I can tell on indexing *any column* of *this
particular table*.  I tried same experiment on another table in the same
replication stream, and I cannot reproduce it.

I am building the index **non-concurrently** every time.


> The thing this doesn't easily explain is that the behavior persists
> across repeated index rebuilds.  A broken HOT chain is only broken
> as long as the older entry is still visible-to-somebody, so that
> such situations ought to be self-healing as time passes.  If it
> fails repeatedly, this theory requires assuming that either
>
> 1. You've got some extremely old open transactions (maybe forgotten
> prepared transactions?), or
>

No prepared_xacts and no transactions older than a few hours.  Several hour
transactions are common in this reporting system.  I have not yet seen if
after several hours the index starts showing up in plans.


> 2. Your workload is constantly generating new broken HOT chains of
> the same sort, so that there's usually a live one when you try
> to build an index.
>
> The fact that you even notice the indcheckxmin restriction indicates
> that you do tend to have long-running transactions in the system,
> else the index would come free for use fairly quickly.  So #1 isn't
> as implausible as I might otherwise think.  But #2 seems probably
> more likely on the whole.  OTOH, neither point is exactly within
> the offered evidence.
>

Is there a way for me to test this theory?  I tried the following with no
change in behavior:

   1. Disable write load to table
   2. Vacuum analyze table (not vac full)
   3. Create index
   4. Explain

Still did not pick up the index.

Thanks,
Jeremy


Index corruption / planner issue with one table in my pg 11.6 instance

2019-12-09 Thread Jeremy Finzel
I have a table with about 7 million records.  I had a query in which I
needed 2 indexes added, one for a created timestamp field another for an id
field; both very high cardinality.

First I noticed the query would not use the timestamp index no matter what
session config settings I used.  I finally created a temp table copy of the
table and verified index is used.  Then I rebuilt the main table with
VACUUM FULL and this caused the index to be used.

I repeated this process again for an id index on same table.  I created the
index and it would never be chosen no matter what, until I rebuilt the
table using VAC FULL.

I have run bt_index_check and bt_index_parent_check with heapallindexed on
one of these indexes but nothing comes up.

But one other noteworthy thing is that a cluster restart appears to fix the
issue, because on a snapshot of this system (which has been restarted) also
at 11.6, the planner picks up the index.

We quite recently (~ 2-3 weeks) did pgupgrade from 9.6 to 11.  This table
in question is fed via pglogical.  I checked similar behavior on another
table in this stream and could not reproduce it.  So for now, it seems
limited to this one table.

Any suggestions as to how I could verify what is going on here?  Anyone
experienced the same?

Thanks!
Jeremy


Re: physical slot xmin dependency on logical slot?

2019-11-19 Thread Jeremy Finzel
>
> I expect that you created the replica in a manner that preserved the
> logical replication slot on it. You also had hot_standby_feedback enabled.
>

As per both you and Andres' replies, we wanted the backup to have the
logical slots on it, because we wanted to allow decoding from the slots on
our backup.  However, what we should have done is drop the slot of the
backup on the master.


> PostgreSQL standbys send the global xmin and (in Pg10+) catalog_xmin to
> the upstream when hot_standby_feedback is enabled. If there's a slot
> holding the catalog_xmin on the replica down, that'll be passed on via
> hot_standby_feedback to the upstream. On Pg 9.6 or older, or if the replica
> isn't using a physical replication slot, the catalog_xmin is treated as a
> regular xmin since there's nowhere in PGPROC or PGXACT to track the
> separate catalog_xmin. If the standby uses a physical slot, then on pg10+
> the catalog_xmin sent by the replica is stored as the catalog_xmin on the
> physical slot instead.
>
> Either way, if you have hot_standby_feedback enabled on a standby, that
> feedback includes the requirements of any replication slots on the standby.
>

Thank you for the thorough explanation.  As I noted in my reply to Andres,
we routinely and intentionally create snapshots with replication slots
intact (but we normally drop the slot on the master immediately), so our
own use case is rare and it's not surprising that we don't find a thorough
explanation of this scenario in the docs.

Thanks,
Jeremy


physical slot xmin dependency on logical slot?

2019-11-18 Thread Jeremy Finzel
We had a scenario today that was new to us.  We had a logical replication
slot that was severely far behind.  Before dropping this logical slot, we
made a physical point-in-time-recovery snapshot of the system with this
logical slot.

This logical slot was causing severe catalog bloat.  We proceeded to drop
the logical slot which was over 12000 WAL segments behind.  The physical
slot was only a few 100 segments behind and still in place.

But now proceeding to VAC FULL the catalog tables did not recover any bloat
beyond the now-dropped logical slot.  Eventually to our surprise, we found
that dropping the physical slot allowed us to recover the bloat.

We saw in forensics after the fact that xmin of the physical slot equaled
the catalog_xmin of the logical slot.  Is there some dependency here where
physical slots made of a system retain all transactions of logical slots it
contains as well?  If so, could someone help us understand this, and is
there documentation around this?  Is this by design?

We had thought that the physical slot would only retain the WAL it needed
for its own restart_lsn, not the segments needed by only logical slots as
well.  Any explanation would be much appreciated!

Thanks,
Jeremy


Re: UPSERT on view does not find constraint by name

2019-10-18 Thread Jeremy Finzel
On Fri, Oct 18, 2019 at 3:42 AM Tom Lane  wrote:

> Jeremy Finzel  writes:
> > I'm not sure if this can be considered a bug or not, but it is perhaps
> > unexpected.  I found that when using a view that is simply select * from
> > table, then doing INSERT ... ON CONFLICT ON CONSTRAINT constraint_name on
> > that view, it does not find the constraint and errors out.  But it does
> > find the constraint if one lists the columns instead.
>
> I'm confused by this report.  The view wouldn't have any constraints,
> and experimenting shows that the parser won't let you name a
> constraint of the underlying table here.  So would you provide a
> concrete example of what you're talking about?
>
> regards, tom lane
>

Apologies for the lack of clarity.  Here is a simple example of what I mean:

test=# CREATE TEMP TABLE foo (id int primary key);
CREATE TABLE
test=# \d foo
   Table "pg_temp_4.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 id | integer |   | not null |
Indexes:
"foo_pkey" PRIMARY KEY, btree (id)

test=# CREATE VIEW bar AS SELECT * FROM foo;
NOTICE:  view "bar" will be a temporary view
CREATE VIEW
test=# INSERT INTO foo (id)
test-# VALUES (1)
test-# ON CONFLICT ON CONSTRAINT foo_pkey
test-# DO NOTHING;
INSERT 0 1
test=# INSERT INTO foo (id)
VALUES (1)
ON CONFLICT ON CONSTRAINT foo_pkey
DO NOTHING;
INSERT 0 0
test=# INSERT INTO foo (id)
VALUES (1)
ON CONFLICT ON CONSTRAINT foo_pkey
DO NOTHING;
INSERT 0 0
test=# INSERT INTO bar (id)
VALUES (1)
ON CONFLICT ON CONSTRAINT foo_pkey
DO NOTHING;
ERROR:  constraint "foo_pkey" for table "bar" does not exist
test=# INSERT INTO bar (id)
VALUES (1)
ON CONFLICT (id)
DO NOTHING;
INSERT 0 0



Of interest are the last 2 statements above.  ON CONFLICT on the constraint
name does not work, but it does work by field name.  I'm not saying it
*should* work both ways, but I'm more wondering if this is
known/expected/desired behavior.

The point of interest for us is that we frequently preserve a table's
"public API" by instead swapping out a table for a view as above, in order
for instance to rebuild a table behind the scenes without breaking table
usage.  Above case is a rare example where that doesn't work, and which in
any case I advise (as does the docs) that they do not use on conflict on
constraint, but rather to list the field names instead.

Thanks,
Jeremy


UPSERT on view does not find constraint by name

2019-10-17 Thread Jeremy Finzel
I'm not sure if this can be considered a bug or not, but it is perhaps
unexpected.  I found that when using a view that is simply select * from
table, then doing INSERT ... ON CONFLICT ON CONSTRAINT constraint_name on
that view, it does not find the constraint and errors out.  But it does
find the constraint if one lists the columns instead.

I did not find any mention of this specifically in the docs, or any
discussion on this topic after a brief search, and I have already asked my
stakeholder to change to using the column list as better practice anyway.
But in any case, I wanted to know if this is a known issue or not.

Thanks!
Jeremy


Re: BRIN index which is much faster never chosen by planner

2019-10-15 Thread Jeremy Finzel
Thank you for the thorough and thoughtful reply!  Please see below.

On Mon, Oct 14, 2019 at 3:48 PM David Rowley 
wrote:

> Another thing which you might want to look at is the correlation
> column in the pg_stats view for the rec_insert_time column. Previous
> to 7e534adcd, BRIN index were costed based on the selectivity
> estimate. There was no accountability towards the fact that the pages
> for those records might have been spread out over the entire table.
> Post 7e534adcd, we use the correlation estimate to attempt to estimate
> how many pages (more specifically "ranges") we're likely to hit based
> on that and the selectivity estimate. This commit intended to fix the
> issue we had with BRIN indexes being selected far too often.  Of
> course, the correlation is based on the entire table, if there are
> subsets of the table that are perhaps perfectly correlated, then the
> planner is not going to know about that. It's possible that some of
> your older rec_insert_times are spread out far more than the newer
> ones.  As a test, you could try creating a new table and copying the
> records over to it in rec_insert_time order and seeing if the BRIN
> index is selected for that table (after having performed an ANALYZE).
>
> It would be interesting if you could show the pg_stats row for the
> column so that we can see if the correlation is low.
>

So what I said originally (and light bulbs now going off) is that the table
is insert-only, but it is **pruned (deletes) to the past year of data**.  I
think this is the key of what I've missed.  Once vacuum runs, we have a
bunch of old physical space being re-used by new inserts, thus botching
that good correlation between physical and logical order.  So it appears
the physical order of the data is indeed well-correlated in recent history,
but not so when you go back a bit further.  Here are pg_stats:

-[ RECORD 1 ]--+---
schemaname | foo
tablename  | log_table
attname| rec_insert_time
inherited  | f
null_frac  | 0
avg_width  | 8
n_distinct | 1.89204e+06
correlation| 0.193951
most_common_elems  |
most_common_elem_freqs |
elem_count_histogram   |

I have not tried creating a fresh table, but if I modify my OP query to
instead take a window of 10 days 100 days ago, the BRIN index actually
performs really bad... identically to the seq scan:

Here is with a seq scan:

   QUERY PLAN
-
 WindowAgg  (cost=26822167.70..26822170.60 rows=129 width=120) (actual
time=200730.856..200730.910 rows=62 loops=1)
   ->  Sort  (cost=26822167.70..26822168.02 rows=129 width=104) (actual
time=200730.834..200730.837 rows=62 loops=1)
 Sort Key: unique_cases.source, unique_cases.rec_insert_time
 Sort Method: quicksort  Memory: 33kB
 ->  Subquery Scan on unique_cases  (cost=26822160.60..26822163.18
rows=129 width=104) (actual time=200730.672..200730.763 rows=62 loops=1)
   ->  Unique  (cost=26822160.60..26822161.89 rows=129
width=124) (actual time=200730.670..200730.753 rows=62 loops=1)
 ->  Sort  (cost=26822160.60..26822160.92 rows=129
width=124) (actual time=200730.668..200730.686 rows=395 loops=1)
   Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
   Sort Method: quicksort  Memory: 80kB
   ->  Nested Loop  (cost=0.00..26822156.08
rows=129 width=124) (actual time=200692.321..200730.121 rows=395 loops=1)
 Join Filter: ((l.category)::text =
filter.category)
 Rows Removed by Join Filter: 552210
 ->  Seq Scan on small_join_table filter
 (cost=0.00..26.99 rows=1399 width=8) (actual time=0.013..0.179 rows=1399
loops=1)
 ->  Materialize  (cost=0.00..26818970.84
rows=129 width=99) (actual time=136.942..143.440 rows=395 loops=1399)
   ->  Seq Scan on log_table l
 (cost=0.00..26818970.20 rows=129 width=99) (actual
time=191581.193..200649.013 rows=395 loops=1)
 Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name) AND (rec_insert_time >= (now() - '100
days'::interval)) AND (rec_insert_time <= (now() - '90 days'::interval)))
 Rows Removed by Filter:
315097963
 Planning Time: 1.541 ms
 Execution Time: 200731.273 ms
(19 rows)

Here is with the forced brin index scan:


QUERY PLAN

Re: BRIN index which is much faster never chosen by planner

2019-10-14 Thread Jeremy Finzel
>
> The other issue is that the estimation of pages fetched using bitmap
> heap scan is rather crude - but that's simply hard, and I don't think we
> can fundamentally improve this.
>

I wanted to follow up on this specific issue.  Isn't this the heart of the
matter and a fundamental problem?  If I want to rely on BRIN indexes as in
a straightforward case as explained in OP, but I don't know if the planner
will be nearly reliable enough, how can I depend on them in production?  Is
this not considered a planner bug or should this kind of case be documented
as problematic for BRIN?  As another way to look at it: is there a
configuration parameter that could be added specific to BRIN or bitmapscan
to provide help to cases like this?

On freshly analyzed tables, I tried my original query again and found that
even with now() - 3 days it does not choose the BRIN index.  In fact it
chose another btree on the table like (id1, id2, rec_insert_time).  With
warm cache, the pg-chosen plan takes 40 seconds to execute, whereas when I
force a BRIN scan it takes only 4 seconds.

I could understand more if the execution times were close, but the actual
BRIN index is orders of magnitude faster than the plan Postgres is
choosing.  I appreciate the feedback on this very much, as I am quite eager
to use BRIN indexes!!!

Thanks,
Jeremy


Re: BRIN index which is much faster never chosen by planner

2019-10-11 Thread Jeremy Finzel
On Thu, Oct 10, 2019 at 6:13 PM Tomas Vondra 
wrote:

>
> So this seems like a combination of multiple issues. Firstly, the bitmap
> index scan on rec_insert_time_brin_1000 estimate seems somewhat poor. It
> might be worth increasing stats target on that column, or something like
> that. Not sure, but it's about the only "fixable" thing here, I think.
>

In the OP I had mentioned that I already increased it to 5000, and it made
no difference.  Ah fine let's go ahead and try 1... still no change:

foo_prod=# ALTER TABLE log_table ALTER COLUMN rec_insert_time SET
STATISTICS 1;
ALTER TABLE
foo_prod=# ANALYZE log_table;
ANALYZE
foo_prod=# EXPLAIN
SELECT
 category, source, MIN(rec_insert_time) OVER (partition by source order by
rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
by source order by rec_insert_time) AS last_source_time
FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
category, source(field1) AS source, rec_insert_time
FROM log_table l
INNER JOIN public.small_join_table filter ON filter.category = l.category
WHERE field1 IS NOT NULL AND l.category = 'music'
AND l.rec_insert_time >= now() - interval '10 days'
ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;

   QUERY PLAN
-
 WindowAgg  (cost=24451299.20..24451313.21 rows=623 width=120)
   ->  Sort  (cost=24451299.20..24451300.75 rows=623 width=104)
 Sort Key: unique_cases.source, unique_cases.rec_insert_time
 ->  Subquery Scan on unique_cases  (cost=24451257.82..24451270.28
rows=623 width=104)
   ->  Unique  (cost=24451257.82..24451264.05 rows=623
width=124)
 ->  Sort  (cost=24451257.82..24451259.38 rows=623
width=124)
   Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
   ->  Nested Loop  (cost=0.00..24451228.90
rows=623 width=124)
 Join Filter: ((l.category)::text =
filter.category)
 ->  Seq Scan on small_join_table filter
 (cost=0.00..26.99 rows=1399 width=8)
 ->  Materialize  (cost=0.00..24435949.31
rows=623 width=99)
   ->  Seq Scan on log_table l
 (cost=0.00..24435946.20 rows=623 width=99)
 Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10
days'::interval)))
(13 rows)

Thanks,
Jeremy


Re: BRIN index which is much faster never chosen by planner

2019-10-11 Thread Jeremy Finzel
Dear Michael,

On Thu, Oct 10, 2019 at 5:20 PM Michael Lewis  wrote:

> Since the optimizer is choosing a seq scan over index scan when it seems
> like it has good row estimates in both cases, to me that may mean costs of
> scanning index are expected to be high. Is this workload on SSD? Has the
> random_page_cost config been decreased from default 4 (compared with cost
> of 1 unit for sequential scan)?
>

It's 1.5


> Your buffer hits aren't great. What is shared_buffers set to? How much ram
> on this cluster?
>

shared_buffers is 4GB.  It has 500G of RAM, but server has several clusters
on it.


>
> With this table being insert only, one assumes correlation is very high on
> the data in this column as shown in pg_stats, but have your confirmed?
>

Yes, but the issue isn't with the BRIN index performing badly or being
fragmented.  It's that it performs great (7x faster than the seq scan) but
postgres doesn't pick using it.  I have seen this same issue also in other
attempts I have made to use BRIN.


> To me, distinct ON is often a bad code smell and probably can be
> re-written to be much more efficient with GROUP BY, lateral & order by, or
> some other tool. Same with the window function. It is a powerful tool, but
> sometimes not the right one.
>

I don't really agree, but it's beside the point because the issue is not in
aggregation.  It's pre-aggregation.  Indeed if I run my query as a simple
select (as I tried) it's the exact same planning issue.  (In my experience,
distinct on for given example is the fastest.  Same with window functions
which prevent inefficient self-joins)


> Is "source" a function that is called on field1? What is it doing/how is
> it defined?
>

I can't see how that matters either, but the "source" function is a mask
for a built-in pg function that is trivial.  This whole query is masked so
as not to expose our actual prod query, but I hope it's still
understandable enough :).

My question is not how to make this query faster in general.  It's that I
want to use BRIN indexes very much, but I'm not sure I can trust they will
scale with the right query plan like I know BTREE will.

Thanks!
Jeremy


Re: BRIN index which is much faster never chosen by planner

2019-10-11 Thread Jeremy Finzel
On Thu, Oct 10, 2019 at 7:22 PM David Rowley 
wrote:

> The planner might be able to get a better estimate on the number of
> matching rows if the now() - interval '10 days' expression was
> replaced with 'now'::timestamptz - interval '10 days'. However, care
> would need to be taken to ensure the plan is never prepared since
> 'now' is evaluated during parse. The same care must be taken when
> creating views, functions, stored procedures and the like.
>

You are on to something here I think with the now() function, even if above
suggestion is not exactly right as you said further down.  I am finding a
hard-coded timestamp gives the right query plan.  I also tested same with
even bigger window (last 16 days) and it yet still chooses the brin index.

foo_prod=# EXPLAIN
foo_prod-# SELECT
foo_prod-#  category, source, MIN(rec_insert_time) OVER (partition by
source order by rec_insert_time) AS first_source_time, MAX(rec_insert_time)
OVER (partition by source order by rec_insert_time) AS last_source_time
foo_prod-# FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
foo_prod(# category, source(field1) AS source, rec_insert_time
foo_prod(# FROM log_table l
foo_prod(# INNER JOIN public.small_join_table filter ON filter.category =
l.category
foo_prod(# WHERE field1 IS NOT NULL AND l.category = 'music'
foo_prod(# AND l.rec_insert_time >= now() - interval '10 days'
foo_prod(# ORDER BY brand_id, last_change, log_id, rec_insert_time DESC)
unique_cases;

   QUERY PLAN
-
 WindowAgg  (cost=24436329.10..24436343.56 rows=643 width=120)
   ->  Sort  (cost=24436329.10..24436330.70 rows=643 width=104)
 Sort Key: unique_cases.source, unique_cases.rec_insert_time
 ->  Subquery Scan on unique_cases  (cost=24436286.24..24436299.10
rows=643 width=104)
   ->  Unique  (cost=24436286.24..24436292.67 rows=643
width=124)
 ->  Sort  (cost=24436286.24..24436287.85 rows=643
width=124)
   Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
   ->  Nested Loop  (cost=0.00..24436256.25
rows=643 width=124)
 Join Filter: ((l.category)::text =
filter.category)
 ->  Seq Scan on small_join_table filter
 (cost=0.00..26.99 rows=1399 width=8)
 ->  Materialize  (cost=0.00..24420487.02
rows=643 width=99)
   ->  Seq Scan on log_table l
 (cost=0.00..24420483.80 rows=643 width=99)
 Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name) AND (rec_insert_time >= (now() - '10
days'::interval)))
(13 rows)

foo_prod=# SELECT now() - interval '10 days';
   ?column?
---
 2019-10-01 08:20:38.115471-05
(1 row)

foo_prod=# EXPLAIN
SELECT
 category, source, MIN(rec_insert_time) OVER (partition by source order by
rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
by source order by rec_insert_time) AS last_source_time
FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
category, source(field1) AS source, rec_insert_time
FROM log_table l
INNER JOIN public.small_join_table filter ON filter.category = l.category
WHERE field1 IS NOT NULL AND l.category = 'music'
AND l.rec_insert_time >= '2019-10-01 08:20:38.115471-05'
ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases;

  QUERY PLAN
---
 WindowAgg  (cost=19664576.17..19664590.63 rows=643 width=120)
   ->  Sort  (cost=19664576.17..19664577.77 rows=643 width=104)
 Sort Key: unique_cases.source, unique_cases.rec_insert_time
 ->  Subquery Scan on unique_cases  (cost=19664533.31..19664546.17
rows=643 width=104)
   ->  Unique  (cost=19664533.31..19664539.74 rows=643
width=124)
 ->  Sort  (cost=19664533.31..19664534.92 rows=643
width=124)
   Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
   ->  Nested Loop  (cost=3181.19..19664503.32
rows=643 width=124)
 ->  Gather  (cost=3180.91..19662574.92
rows=643 width=99)
   Workers Planned: 3
   ->  Parallel Bitmap Heap Scan on
log_table l  (cost=2180.91..19661510.62 rows=207 width=99)
 Recheck Cond: (rec_insert_time
>= '2019-10-01 08:20:38.115471-05'::timestamp with time zone)
 Filter: ((field1 IS NOT NULL)
AND (category = 'music'::name))
   

BRIN index which is much faster never chosen by planner

2019-10-10 Thread Jeremy Finzel
Good Afternoon,

I posted about this on another thread here
,
but the topic was not precisely planner issues, so I wanted to post it here.

I am running Postgres 11.5.  I have a table that is insert-only and has 312
million rows.  It is also pruned continuously to only past year.  The size
of the table is 223 GB with indexes, 140GB without.  One of the fields is
rec_insert_time timestamptz.  Here are all potentially relevant table stats:

schemaname  | foo
relname | log_table
n_tup_ins   | 86850506
n_tup_upd   | 0
n_tup_del   | 68916115
n_tup_hot_upd   | 0
n_live_tup  | 312810691
n_dead_tup  | 9405132
n_mod_since_analyze | 11452
last_vacuum | 2019-09-20 09:41:43.78383-05
last_autovacuum | 2019-10-04 13:56:16.810355-05
last_analyze| 2019-10-10 09:34:26.807695-05
last_autoanalyze|
vacuum_count| 2
autovacuum_count| 1
analyze_count   | 13
autoanalyze_count   | 0
total_relation_size | 223 GB
relation_size   | 139 GB
table_size  | 140 GB

I have a simple query looking at past 10 days based on rec_insert_time, and
it will not choose the BRIN index even with several configurations.  Here
are my all relevant indexes (I intentionally do not have a btree on
rec_insert_time because I believe BRIN *should* fit better here):

"log_table_brand_id_product_rec_insert_time_idx" btree (brand_id, product,
rec_insert_time)
"log_table_rec_insert_time_idx" brin (rec_insert_time)
"log_table_rec_insert_time_idx1" brin (rec_insert_time) WITH
(pages_per_range='64')
"rec_insert_time_brin_1000" brin (rec_insert_time) WITH
(pages_per_range='1000')

And here is the SQL:
SELECT
 category, source, MIN(rec_insert_time) OVER (partition by source order by
rec_insert_time) AS first_source_time, MAX(rec_insert_time) OVER (partition
by source order by rec_insert_time) AS last_source_time
FROM (SELECT DISTINCT ON (brand_id, last_change, log_id)
category, source(field1) AS source, rec_insert_time
FROM log_table l
INNER JOIN public.small_join_table filter ON filter.category = l.category
WHERE field1 IS NOT NULL AND l.category = 'music'
AND l.rec_insert_time >= now() - interval '10 days'
ORDER BY brand_id, last_change, log_id, rec_insert_time DESC) unique_cases
;

This query will choose a seq scan on log_table every time in spite of these
BRIN indexes on rec_insert_time.
@Michael Lewis  had suggested I check
default_statistics_target for this column.  I raised it to 5000 for this
column and still it's choosing a seq scan.

Here is default chosen plan (takes 2 minutes 12 seconds):
 WindowAgg  (cost=24437881.80..24437897.70 rows=707 width=120) (actual
time=132173.173..132173.222 rows=53 loops=1)
   Output: unique_cases.category, unique_cases.source,
min(unique_cases.rec_insert_time) OVER (?),
max(unique_cases.rec_insert_time) OVER (?), unique_cases.rec_insert_time
   Buffers: shared hit=391676 read=17772642 dirtied=4679 written=7
   ->  Sort  (cost=24437881.80..24437883.56 rows=707 width=104) (actual
time=132173.146..132173.149 rows=53 loops=1)
 Output: unique_cases.source, unique_cases.rec_insert_time,
unique_cases.category
 Sort Key: unique_cases.source, unique_cases.rec_insert_time
 Sort Method: quicksort  Memory: 32kB
 Buffers: shared hit=391676 read=17772642 dirtied=4679 written=7
 ->  Subquery Scan on unique_cases  (cost=24437834.20..24437848.34
rows=707 width=104) (actual time=132172.950..132173.062 rows=53 loops=1)
   Output: unique_cases.source, unique_cases.rec_insert_time,
unique_cases.category
   Buffers: shared hit=391676 read=17772642 dirtied=4679
written=7
   ->  Unique  (cost=24437834.20..24437841.27 rows=707
width=124) (actual time=132172.946..132173.048 rows=53 loops=1)
 Output: l.category, (source(l.field1)),
l.rec_insert_time, l.brand_id, l.last_change, l.log_id
 Buffers: shared hit=391676 read=17772642 dirtied=4679
written=7
 ->  Sort  (cost=24437834.20..24437835.96 rows=707
width=124) (actual time=132172.939..132172.962 rows=466 loops=1)
   Output: l.category, (source(l.field1)),
l.rec_insert_time, l.brand_id, l.last_change, l.log_id
   Sort Key: l.brand_id, l.last_change, l.log_id,
l.rec_insert_time DESC
   Sort Method: quicksort  Memory: 90kB
   Buffers: shared hit=391676 read=17772642
dirtied=4679 written=7
   ->  Nested Loop  (cost=0.00..24437800.73
rows=707 width=124) (actual time=4096.253..132171.425 rows=466 loops=1)
 Output: l.category, source(l.field1),
l.rec_insert_time, l.brand_id, l.last_change, l.log_id
 Inner Unique: true
 Join Filter: 

pg_regress restart cluster?

2019-09-06 Thread Jeremy Finzel
Hello!

I am using the basic extension building infrastructure with sql and
expected files, but what I want to test is changing a config setting and
then restarting the cluster with shared_preload_libraries in place. Is
there a canonical way to do this or does anyone have any examples of this?
I appreciate it very much!

Thanks,

Jeremy


Configuring bgw_restart_time

2019-08-13 Thread Jeremy Finzel
I am working on an extension that uses background workers, and interested
in adding code for it to auto-restart after a crash or server restart
(similar to as-coded in worker_spi).

But I'm also interested in being able to configure bgw_restart_time using a
GUC without having to restart the server, using only SIGHUP.  For example,
I want to normally have the worker restart after 10 seconds.  But if I am
doing maintenance on the server (without a db restart), I perhaps want to
change this to -1 (BGW_NEVER_RESTART), kill the worker, do my business,
then restart the worker.  Or another reason would be my background worker
has some bug and I want to disable it without having to restart my db
server.  For us as for many, a small outage for a db restart is expensive.

I have played around with this and done some digging around the codebase
in bgworker.c (with my limited knowledge thus far of the pg codebase), and
so far as I can tell, it isn't possible to change bgw_restart_time without
a server restart.  But I'm not sure if that's just because I don't know how
this code works, or if the current libraries actually don't support
modifying this part of the background worker.  I am setting the GUC in
_PG_init, but I can see that changing it after it has been registered has
no effect unless I restart the server.

If indeed this is possible, I'd be very grateful for some insight on how to
do it.  I may even try to add such an example to worker_spi.

Thanks!
Jeremy


Re: proposal - patch: psql - sort_by_size

2019-07-31 Thread Jeremy Finzel
On Fri, Jun 28, 2019 at 10:13 AM Pavel Stehule 
wrote:

> Hi
>
> I returned to possibility to sort output of \d* and \l by size. There was
> more a experiments in this area, but without success. Last patch was
> example of over engineering, and now, I try to implement this feature
> simply how it is possible. I don't think so we need too complex solution -
> if somebody needs specific report, then it is not hard to run psql with
> "-E" option, get and modify used query (and use a power of SQL). But
> displaying databases objects sorted by size is very common case.
>
> This proposal is based on new psql variable "SORT_BY_SIZE". This variable
> will be off by default. The value of this variable is used only in verbose
> mode (when the size is displayed - I don't see any benefit sort of size
> without showing size). Usage is very simple and implementation too:
>
> \dt -- sorted by schema, name
> \dt+ -- still sorted  by schema, name
>
> \set SORT_BY_SIZE on
> \dt -- sorted by schema, name (size is not calculated and is not visible)
> \dt+ -- sorted by size
>
> \dt+ public.* -- sorted by size from schema public
>
> Comments, notes?
>
> Regards
>
> Pavel
>
>
One oddity about pg_relation_size and pg_table_size is that they can be
easily blocked by user activity.  In fact it happens to us often in
reporting environments and we have instead written different versions of
them that avoid the lock contention and still give "close enough" results.

This blocking could result in quite unexpected behavior, that someone uses
your proposed command and it never returns.  Has that been considered as a
reality at least to be documented?

Thanks,
Jeremy


Docs for refresh materialized view concurrently

2019-05-31 Thread Jeremy Finzel
Speaking with Robert today at pgcon, I happily discovered that REFRESH
MATERIALIZED VIEW CONCURRENTLY actually only updates rows that have changed
since the last refresh, rather than rewriting every row.  In my curiosity,
I went to the docs, and found that this detail is not mentioned anywhere.

This is a great feature that is being undersold, and it should be made
clear in the docs.

In my experience, there can be tons of WAL generated from large
materialized views and the normal REFRESH (without CONCURRENTLY).  I had
assumed the only benefit of CONCURRENTLY was to allow concurrent access to
the table.  But actually the incremental refresh is a much bigger win for
us in reducing WAL overhead drastically.

I've not submitted a patch before, and have a few suggestions I'd like
feedback on before I write one (for the docs only).

1.

First, even this summary looks untrue:

REFRESH MATERIALIZED VIEW — replace the contents of a materialized view.

"replace" is not really accurate with the CONCURRENTLY option, because in
fact it only updates changed rows.

Perhaps instead of "replace":

   - "replace or incrementally update the contents of a materialized view".

Also, the Description part has the same inaccuracy:

"completely replaces the contents of a materialized view.The old
contents are discarded."

That is not true with CONCURRENTLY, correct?  Only the old contents *which
have changed* are discarded.

2.

Lastly, I would suggest adding something like the following to the first
paragraph under CONCURRENTLY:

   - With this option, only actual changed rows are updated in the
   materialized view, which can significantly reduce the amount of write churn
   and WAL traffic from a refresh if only a small number of rows will change
   with each refresh.  It is recommended to have a unique index on the
   materialized view if possible, which will improve the performance of a
   concurrent refresh.

Please correct me if my understanding of this is not right.

3.

On a different note, none of the documentation on materialized views notes
that they can only be LOGGED.  This should be noted, or at least it should
be noted that one cannot create an UNLOGGED materialized view in the same
place it says that one cannot create a temporary one (under Description in
CREATE MATERIALIZED VIEW).


Thanks!
Jeremy Finzel


Question about new pg12 feature no-rewrite timestamp to timestamptz conversion

2019-05-20 Thread Jeremy Finzel
I have a question about this (really exciting) feature coming in pg12:

Allow ALTER TABLE .. SET DATA TYPE timestamp/timestamptz to avoid a table
rewrite when the session time zone is UTC (Noah Misch)

In the UTC time zone, the data types are binary compatible.

We actually want to migrate all of our databases to timestamptz
everywhere.  But some of them have historically saved data in a *local*
time zone with data type timestamp.

I assume there is no similarly easy way to do this alter type without a
table rewrite for a local time zone?  I would assume DST changes would be
an issue here.

But it would be really nice if we have a table with timestamp data saved @
America/Chicago time zone, to set the session to 'America/Chicago' and
alter type to timestamptz, and similarly avoid a table rewrite.  Is this
possible or feasible?

Thank you!
Jeremy


Re: Automated way to find actual COMMIT LSN of subxact LSN

2019-03-21 Thread Jeremy Finzel
>
> I find this to be unactionably vague.  What does it mean to claim "an
> LSN is visible"?  An LSN might not even point to a WAL record, or it
> might point to one that has nontransactional effects.  Moreover, any
> behavior of this sort would destroy what I regard as a bedrock property
> of recover-to-LSN, which is that there's a well defined, guaranteed-finite
> stopping point.  (A property that recover-to-XID lacks, since the
> specified XID might've crashed without recording either commit or abort.)
>

I mentioned that my specific use case is that I am picking out an LSN or
XID within the context of logical decoding.  So I don't think that's vague,
but let me clarify.  When using the peek_changes or get_changes functions,
they only show a particular update to a particular row, with a
corresponding LSN and transaction ID.  That's what I see using both
test_decoding and pglogical_output, both of which I have used in this way.
In that context at least, all LSNs and XIDs point to committed WAL data
only.


> I think what you ought to be doing is digging the xmin out of the inserted
> tuple you're concerned with and then doing recover-to-XID.  There's
> definitely room for us to make it easier if the XID is a subxact rather
> than a main xact.  But I think identifying the target XID is your job
> not the job of the recovery-stop-point mechanism.
>

I'm open to that, but how will it help if I can't guarantee that the tuple
wasn't updated again after the original insert/update of interest?

Thank you,
Jeremy


Re: Automated way to find actual COMMIT LSN of subxact LSN

2019-03-21 Thread Jeremy Finzel
>
> It would seem like what you're asking for is to continue until the commit
> of the parent transaction, not just the next commit after the subcommit.
> Otherwise (if that's an unrelated xact) the subxact would still not be
> committed, so that you might as well have stopped short of it.
>

Right, the parent transaction is what I meant.


> I'd be in favor of that for recovery_target_xid, but I'm not at all
> convinced about changing the behavior for a target LSN.  The fact that
> the target is a subcommit seems irrelevant when you specify by LSN.
>

Perhaps some context will help.  There have been 2 cases in which I have
tried to do this, both of them based on logical decoding, and finding
either a transaction id or an LSN to recover to.  Actually, the only reason
I have ever used transaction id instead of LSN is on <= 9.6 because the
latter isn't supported until pg10.

For this use case, my goal is simply to be able to recover the the point
immediately after a particular decoded log line is visible, without
necessarily having to find out the final parent transaction id.

Given this, I am open to different implementations but I would like to
either be able to specify an LSN or transaction ID, and have a feature that
allows the recovery target to roll forward just until it is visible, even
if the LSN or transaction ID is not the actual commit of the parent
transaction.


> I don't recall this for sure, but doesn't a parent xact's commit record
> include all subxact XIDs?  If so, the implementation would just require
> searching the subxacts as well as the main XID for a match to
> recovery_target_xid.
>

Yes, I believe so.

Thanks,
Jeremy


Re: Automated way to find actual COMMIT LSN of subxact LSN

2019-03-20 Thread Jeremy Finzel
>
> If recovery_target_inclusive were able to take the third value
> "xact", is it exactly what you want?
>
> And is it acceptable?
>

Yes, that would be exactly what I would want.  It would work to have a 3rd
value for recovery_target_inclusive, although perhaps it's debatable that
instead, it should actually be the default behavior to roll any LSN with
recovery_target_inclusive = true until it is actually visible?  If I say I
want to "include" an LSN in my recovery target, it doesn't make much sense
if that just won't be visible unless it's an actual commit LSN, so in fact
the recovery point does not include the LSN.

A related problem kind of demonstrates the same odd behavior.  If you put
in recovery_target_xid to a subtransaction_id, it just skips it and
continues recovering, which really seems to be undesirable behavior.  It
would be nice if that also could roll up to the next valid actual commit
transaction.

Thanks!
Jeremy


Automated way to find actual COMMIT LSN of subxact LSN

2019-03-19 Thread Jeremy Finzel
I want to build automation to recover a database to a specific LSN
*inclusive*, even if that LSN is from a subtransaction.  The problem I am
facing is that I know what specific LSN wrote a row on a remote system, but
if I create a recovery.conf file with:

recovery_target_lsn = '95F/BBA36DF8'

and 95F/BBA36DF8 is actually a subtransaction, then even if I use default
behavior of recovery_target_inclusive = true, that transaction will NOT be
included in the restore point, because it is prior to the actual COMMIT LSN
of which this lsn/subxact is a part.

My hack for now is to simply manually scan down until I find the COMMIT,
which is the only way so far I can figure to find it out.  I don't want to
hack some kind of search script based on this if there is already a better
way to get this information... anyone know of a way?

Thank you,
Jeremy


relcache reference leak with pglogical replication to insert-only partitioned table?

2019-01-26 Thread Jeremy Finzel
I understand it's not fully supported to replicate to a differently
partitioned setup on a subscriber with either pglogical or the native
logical replication, however I also know that INSERT triggers can be fired
in replication mode.  I have an insert-only OLTP table that I want
partitioned only on the subscriber system.  I have this setup using the
"old style" partitioning as it is a 9.6 system.

Provider is 9.6.6 pglogical 2.1.1
Subscriber is 9.6.10 pglogical 2.1.1

Everything appears good as far as the data.  It is partitioning correctly.
Queries on the data are planning correctly.  However, I am now getting
these WARNING messages constantly.  How concerned should I be?  Is there a
fix for this?  Any insight is much appreciated!

2019-01-27 03:12:34.150 GMT,,,135600,,5c4d1f44.211b0,6794,,2019-01-27
03:02:28 GMT,54/0,1057372660,WARNING,01000,"relcache reference leak:
relation ""foo_pkey"" not closed","apply COMMIT in commit before
14DB/34DB1B78, xid 1476598649 commited at 2019-01-26 21:12:34.071673-06
(action #10) from node replorigin 22""pglogical apply 16420:2094659706"

Thank you!
Jeremy


Implicit timezone conversion replicating from timestamp to timestamptz?

2019-01-24 Thread Jeremy Finzel
We are working to migrate several large tables from the timestamp to the
timestamptz data type by using logical replication (so as to avoid long
downtime for type conversions).  We are using pglogical but curious if what
I share below applies to native logical replication as well.

Both source and destination dbs are at localtime, which is
'America/Chicago' time zone.

The source system has a timestamp stored "at time zone UTC", like this for
6:00pm Chicago time:
2019-01-24 20:00:00.00

I was *very surprised* to find that replicating above timestamp to
timestamptz actually does so correctly, showing this value in my psql
client on the subscriber:
2019-01-24 14:00:00.00-06

How does it know/why does it assume it knows that the time zone of the
timestamp data type is UTC on the provider given that my clusters are at
America/Chicago?  I would have actually expected an incorrect conversion of
the data unless I set the timezone to UTC on the way in on the subscriber
via a trigger.

That is, I was expecting to see this:
2019-01-24 20:00:00.00-06

Which is obviously wrong.  So why does it do this and is there some
assumption being made somewhere in the code base that a timestamp is
actually saved "at time zone UTC"?

Thanks,
Jeremy


Re: No such file or directory in pg_replslot

2018-12-08 Thread Jeremy Finzel
On Sat, Dec 8, 2018 at 1:21 PM Jeremy Finzel  wrote:

> That doesn't indicate an error.  You need to provide more details what
>> made you consider things wedged...
>>
>> Andres
>>
>
> Thank you very much for the reply.  We typically see no visible
> replication delay over 5 minutes ever.  Today we saw a delay of over 3
> hours, and no obvious increase in workload either on the provider or the
> subscriber.  I also did not see the LSN advancing whatsoever in terms of
> applying changes.
>
> I first checked for long-running transactions on the master but there was
> nothing too unusual except an ANALYZE which I promptly killed, but with no
> improvement to the situation.
>
> I found the messages above using strace after canceling the subscription
> and finding that the process was taking extremely long to cancel.  There
> are 2.1 million files in pg_replslot which I don't think is normal?  Any
> ideas as to where I should be looking or what could cause this?
>
> Thanks,
> Jeremy
>

I have very good news in that waiting it out for several hours, it resolved
itself.  Thank you, your input steered us in the right direction!

Jeremy


Re: No such file or directory in pg_replslot

2018-12-08 Thread Jeremy Finzel
>
> That doesn't indicate an error.  You need to provide more details what
> made you consider things wedged...
>
> Andres
>

Thank you very much for the reply.  We typically see no visible replication
delay over 5 minutes ever.  Today we saw a delay of over 3 hours, and no
obvious increase in workload either on the provider or the subscriber.  I
also did not see the LSN advancing whatsoever in terms of applying changes.

I first checked for long-running transactions on the master but there was
nothing too unusual except an ANALYZE which I promptly killed, but with no
improvement to the situation.

I found the messages above using strace after canceling the subscription
and finding that the process was taking extremely long to cancel.  There
are 2.1 million files in pg_replslot which I don't think is normal?  Any
ideas as to where I should be looking or what could cause this?

Thanks,
Jeremy


No such file or directory in pg_replslot

2018-12-08 Thread Jeremy Finzel
I don't know if this applies only to pglogical or logical decoding in
general.  This is on a 9.6.10 provider running pglogical 2.2.0.  Subscriber
has same versions.  We had a replication delay situation this morning,
which I think may have been due to a really long transaction but I've yet
to verify that.

I disabled and re-enabled replication and at one point, this created an
error on start_replication_slot that the pid was already active.

Somehow replication got wedged and now even though replication appears to
be working, strace shows these kinds of errors continually:
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-F400.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-F500.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-F600.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-F700.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-F800.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-F900.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-FA00.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-FB00.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-FC00.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-FD00.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("pg_replslot/pgl_foo_providerb97b25d_foo336ddc1/xid-1248981532-lsn-C940-FE00.snap",
O_RDONLY) = -1 ENOENT (No such file or directory)

Any suggestions?  This is a showstopper for us.

Thank you,
Jeremy


How do I get altered object from GRANT event trigger?

2018-12-06 Thread Jeremy Finzel
I noticed that there is no information for GRANT in
pg_event_trigger_ddl_commands().  I am trying to determine which
table/schema is being altered.  Is there any way to do this either with
built-in functionality, or extending the current code in event_trigger.c?
Is the reason for this limitation that GRANT does not perform the same kind
of catalog changes that other commands do?

Thanks!
Jeremy


worker_spi shouldn't execute again on sigterm

2018-11-28 Thread Jeremy Finzel
I noticed that the way the test module worker_spi is written, it will
execute the main loop SQL one more time after it gets a sigterm, THEN exit
1.  This was surprising to me where I used this module as a pattern for my
own background worker as I would have thought it should bail immediately
without executing any more SQL.

Shouldn't we add something like this line before it enters the phase where
it starts the transaction and executes the SQL?

  /*
 * In case of a SIGTERM, exit immediately
 */
if (got_sigterm)
{
break;
}

Please help me if I'm missing something.

Thanks,
Jeremy


Re: Race condition in WaitForBackgroundWorkerStartup

2018-11-13 Thread Jeremy Finzel
On Tue, Nov 13, 2018 at 6:17 AM Amit Kapila  wrote:

> On Mon, Nov 12, 2018 at 11:55 PM Jeremy Finzel  wrote:
> >
> > I believe I found a race condition in WaitForBackgroundWorkerStartup in
> the case where it encounters an ERROR during startup.  I found that
> depending on the speed of the system, it will unreliably return either
> status BGWH_STOPPED or BGWH_STARTED.  But I can reliably reproduce getting
> BGWH_STOPPED by tweaking the worker_spi.c test module.
> >
>
> Yeah, I think it is possible that you get different values in such
> cases because we consider worker status as started after we have
> launched the worker.  Now, if we get the error in the worker
> initialization, then the user can get any of those values.  I think
> this is what is happening in your case where you are saying "ERROR
> during startup".
> Am I missing something?
>

Perhaps.  What I am saying is that some machines show ERROR during startup,
and some machines don't get an error at all, return successfully, then
immediately error and die in the background, but the client is not shown
this.  The behavior isn't predictable.  However, I can get a predictable
ERROR to happen always if I put a short pause before
WaitForBackgroundWorkerStartup.

I'm unclear on what counts as "worker initialization".  The error is
happening in the worker_spi_main function, not in the worker_spi_launch
function.  So does an immediate error in worker_spi_main count as part of
the worker initialization?

Thanks!
Jeremy


Race condition in WaitForBackgroundWorkerStartup

2018-11-12 Thread Jeremy Finzel
I believe I found a race condition in WaitForBackgroundWorkerStartup in the
case where it encounters an ERROR during startup.  I found that depending
on the speed of the system, it will unreliably return either status
BGWH_STOPPED or BGWH_STARTED.  But I can reliably reproduce getting
BGWH_STOPPED by tweaking the worker_spi.c test module.

On my own system running 11.1 (or any other version of pg actually), it
returns BGWH_STOPPED and thus a hard error message (ERROR: could not start
background process).  But for other colleagues, it returns BGWH_STARTED and
thus the client sees the pid that was launched.  One then will see an error
in the server logs only as the process exits.

Here is the relevant section of worker_spi.c:395-398:
if (!RegisterDynamicBackgroundWorker(, ))
PG_RETURN_NULL();

status = WaitForBackgroundWorkerStartup(handle, );

First, I hacked the SQL in the worker_spi_main module to be invalid.  Then
I see one or the other behavior (pid result or ERROR) depending on user.

Then I added an arbitrary sleep before the WaitForBackgroundWorkerStartup
call, and reliably, it will always shows an ERROR message.

I'm not sure if this is substantial or not, but it's causing me a problem
where I am regression testing an invalid background worker launch and can't
trust a reliable output.

This was my original post:
https://www.postgresql.org/message-id/cama1xuhfap+aibpahskjrwn4cd9o8kyghwtg99jnofredzs...@mail.gmail.com

Now that I figured out the issue, and that it's unrelated to my extension,
I thought it warranted to start a separate thread.  I am not sure how to
solve this issue best.

Thanks!
Jeremy


Re: Changes to error handling for background worker initialization?

2018-11-12 Thread Jeremy Finzel
On Mon, Oct 22, 2018 at 9:36 AM Jeremy Finzel  wrote:

> Hello -
>
> I have an extension that uses background workers.  I pass a database oid
> as an argument in order to launch the worker using function
> BackgroundWorkerInitializeConnectionByOid.  In one of my regression tests
> that was written, I intentionally launch the worker with an invalid oid.
> In earlier PG versions the worker would successfully launch but then
> terminate asynchronously, with a message in the server log.  Now, it does
> not even successfully launch but immediately errors (hence failing my
> regression tests).
>
> I have recently installed all later point releases of all versions 9.5-11,
> so I assume this is due to some code change.  The behavior seems reasonable
> but I don't find any obvious release notes indicating a patch that would
> have changed this behavior.  Any thoughts?
>
> Thanks,
> Jeremy
>

I still haven't determined the source of this error, but I have determined
that it must not be related to a difference in point release versions as to
background worker error handling, because I am seeing different behavior
for identical postgres version on my machine vs. others.  I would
appreciate any ideas as to how this could possibly happen because I'm not
sure the right way now to build this regression test.

The test launches the background worker with an invalid database oid.

Here is what I am seeing running pg 11.1 on my system (same behavior I get
on 9.5-10 as well):

 SELECT _launch(999::OID) AS pid;
! ERROR:  could not start background process
! HINT:  More details may be available in the server log.

This is what others are seeing (the worker fails asynchronously and you see
it in the server log):

 SELECT _launch(999::OID) AS pid;
!   pid
! ---
!  18022
! (1 row)

I could share the C code but it's not that interesting.  It just calls
BackgroundWorkerInitializeConnectionByOid.  It is essentially a duplicate
of worker_spi.  Here is the relevant section:

sprintf(worker.bgw_function_name, "worker_spi_main");
snprintf(worker.bgw_name, BGW_MAXLEN, "worker_spi worker %d", i);
snprintf(worker.bgw_type, BGW_MAXLEN, "worker_spi");
worker.bgw_main_arg = Int32GetDatum(i);
/* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */
worker.bgw_notify_pid = MyProcPid;

if (!RegisterDynamicBackgroundWorker(, ))
PG_RETURN_NULL();

status = WaitForBackgroundWorkerStartup(handle, );

if (status == BGWH_STOPPED)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_RESOURCES),
errmsg("could not start background process"),
errhint("More details may be available in the server log.")));

So on my machine, I am getting status == BGWH_STOPPED, whereas with others,
they are not getting that behavior.

Thanks,
Jeremy


Changes to error handling for background worker initialization?

2018-10-22 Thread Jeremy Finzel
Hello -

I have an extension that uses background workers.  I pass a database oid as
an argument in order to launch the worker using function
BackgroundWorkerInitializeConnectionByOid.  In one of my regression tests
that was written, I intentionally launch the worker with an invalid oid.
In earlier PG versions the worker would successfully launch but then
terminate asynchronously, with a message in the server log.  Now, it does
not even successfully launch but immediately errors (hence failing my
regression tests).

I have recently installed all later point releases of all versions 9.5-11,
so I assume this is due to some code change.  The behavior seems reasonable
but I don't find any obvious release notes indicating a patch that would
have changed this behavior.  Any thoughts?

Thanks,
Jeremy


Re: Proposal for disk quota feature

2018-09-21 Thread Jeremy Finzel
On Fri, Sep 21, 2018 at 9:21 AM Hubert Zhang  wrote:

> just fast reaction - why QUOTA object?
>> Isn't ALTER SET enough?
>> Some like
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>> New DDL commans looks like too hard hammer .
>
>
> It's an option. Prefer to consider quota setting store together:
> CREATE DISK QUOTA way is more nature to store quota setting in a separate
> pg_diskquota catalog
> While ALTER SET way is more close to store quota setting in pg_class,
> pg_role, pg_namespace. etc in an integrated way.
> (Note that here I mean nature/close is not must, ALTER SET could also
> store in pg_diskquota and vice versa.)
>
> Here are some differences I can think of:
> 1 pg_role is a global catalog, not per database level. It's harder to
> tracker the user's disk usage in the whole clusters(considering 1000+
> databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it
> only tracks the user's disk usage inside the current database.
> 2 using separate pg_diskquota could add more field except for quota limit
> without adding too many fields in pg_class, e.g. red zone to give the user
> a warning or the current disk usage of the db objects.
>
> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule 
> wrote:
>
>>
>>
>> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang 
>> napsal:
>>
>>>
>>>
>>>
>>>
>>> *Hi all,We redesign disk quota feature based on the comments from Pavel
>>> Stehule and Chapman Flack. Here are the new design.OverviewBasically,  disk
>>> quota feature is used to support multi-tenancy environment, different level
>>> of database objects could be set a quota limit to avoid over use of disk
>>> space. A common case could be as follows: DBA could enable disk quota on a
>>> specified database list. DBA could set disk quota limit for
>>> tables/schemas/roles in these databases. Separate disk quota worker process
>>> will monitor the disk usage for these objects and detect the objects which
>>> exceed their quota limit. Queries loading data into these “out of disk
>>> quota” tables/schemas/roles will be cancelled.We are currently working at
>>> init implementation stage. We would like to propose our idea firstly and
>>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>>> disk quota)1 Specify the databases with disk quota enabled in GUC
>>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>>> ‘3MB’);*
>>>
>>
>> just fast reaction - why QUOTA object?
>>
>> Isn't ALTER SET enough?
>>
>> Some like
>>
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>>
>> New DDL commans looks like too hard hammer .
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *3 Simulate a schema out of quota limit case: suppose table a1 and table
>>> a2 are both under schema s1.INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT INTO a2 SELECT
>>> generate_series(1,300);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);ERROR:  schema's disk space quota exceededDROP
>>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>>> following components.1. Quota Setting Store is where the disk quota setting
>>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>>> store these information. pg_diskquota is
>>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>>> diskquota name */int16 quotatype; /* diskquota type name */ Oid
>>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>>> change of database objects. We plan to use stat collector to detect the
>>> ‘active’ table list at initial stage. But stat collector has some
>>> limitation on finding the active table which is in a running transaction.
>>> Details see TODO section.3. Quota Size Checker is where to calculate the
>>> size and compare with quota limit for database objects. According to
>>> Pavel’s comment, autovacuum launcher and worker process could be a good
>>> reference to disk quota. So we plan to use a disk quota launcher daemon
>>> process and several disk quota worker process to finish this work. Launcher
>>> process is responsible for starting worker process based on a user defined
>>> database list from GUC. Worker process will connect to its target database
>>> and monitor the disk usage for objects in this database. In init stage of
>>> worker process, it will call calculate_total_relation_size() to calculate
>>> the size for each user table. After init stage, worker process will 

Re: Delta Materialized View Refreshes?

2018-09-14 Thread Jeremy Finzel
>
>
> Could you apply something similar using triggers?
> One question would be how PG would identify changes to existing rows -
> using the replication facilities to essentially replicate into the view?
> This would be quite tricky I reckon. Otherwise a change to the underlying
> table may not propagate correctly to the MV.
>

That's not what I had in mind.  I only mean when REFRESH MATERIALIZED VIEW
is run, it gathers the results of the view in memory, then instead of
essentially "wiping and reloading" the table, it would only write the
differences.  So if 90% of the rows would be the same as before the
refresh, we only update 10% of the rows.

This would also mean materialized views could get bloated just like tables.

Thanks,
Jeremy


Logical replication to partitioned subscriber

2018-09-14 Thread Jeremy Finzel
Are there any plans to support logical replication to a table which is only
partitioned on the subscriber?  We are very interested in such a feature
for a couple use cases.  We are using logical replication for archiving
data and we only want it partitioned on the archive server, and we want it
to just work transparently with logical replication.

Another use case is to convert a huge OLTP database table to be
partitioned, and as opposed to writing a script, use logical replication to
replicate to a destination replica which has the partitioning work done
online, then swap nodes.

We are using pglogical, but I see from the docs that the same limitations
apply even in pg11's built-in logical replication.

Thanks!
Jeremy


Delta Materialized View Refreshes?

2018-09-14 Thread Jeremy Finzel
To me the biggest disadvantage of using a materialized view instead of a
table delta load for an ETL process, for example, is that the materialized
view is very inefficient in terms of write overhead, for of course it
rewrites the whole set of data in the table, meaning heap, indexes, WAL and
everything that goes with it.

Couldn't this be coded to only update changed records?  As an example, we
have a materialized view definition that holds summarized data on customer
records, and it only changes when new customers are added.  It would be so
nice to only write that change.

I recognize that it may take longer, but for our environment as I'm sure is
true of many others, we are often much more excited about reducing write
churn than read churn.

If in fact it could be option, I am curious if we could leverage the work
being done on the merge functionality to do a "merge" from the materialized
view definition into the actual persistent object.

If people really like the current functionality, then at least it could be
an option like REFRESH MATERIALIZED VIEW DELTA or something - I dunno I am
more interested in just having the feature!

Many thanks for any thoughts on this subject.
Jeremy


Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-09-06 Thread Jeremy Finzel
>
> Just skimming the thread but I'd have to say being able to leverage
> pg_dump's dependency resolution is a major reason for adding features to it
> instead sticking to writing psql scripts.  This feature in a multi-tenant
> situation is something with, I suspect, reasonably wide appeal.
>

That I would agree with if in fact it's true people want that, but that
wasn't how the problem trying to be solved was communicated.  From what I
read in the initial examples given, just using psql is more than sufficient
in those cases.

I will grant that copying the structure and data at the same time would be
much easier, however.  Because using psql, you need pg_dump to create your
structure then a separate psql script to copy the data.

But again with --data-only examples given, it's so easy to do that with
psql copy I just don't understand the value of the feature unless you
really are saying you require the dependency resolution.

I agree with some of the hesitation of complicating the syntax and allowing
too much customization for what pg_dump is designed for.  Really, if you
need more customization, copy gives you that.  So I don't personally
consider it a missing feature because both tools have different uses and I
haven't found that any of this disrupts my workflow.  FWIW...

Thanks,
Jeremy


Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-09-06 Thread Jeremy Finzel
On Mon, May 21, 2018 at 6:34 AM Carter Thaxton 
wrote:

> Many times I've wanted to export a subset of a database, using some sort
> of row filter condition on some of the large tables.  E.g. copying a
> production database to a staging environment, but with some time series
> data only from the past month.
>
> We have the existing options:
>   --include-table=table(and its -t synonym)
>   --exclude-table=table
>   --exclude-table-data=table
>
> I propose a new option:
>   --include-table-data-where=table:filter_clause
>
> One would use this option as follows:
>
>   pg_dump --include-table-data-where=largetable:"created_at >=
> '2018-05-01'" database_name
>
> The filter_clause is used as the contents of a WHERE clause when querying
> the data to generate the COPY statement produced by pg_dump.
>
> I've prepared a proposed patch for this, which is attached.  The code
> changes are rather straightforward.  I did have to add the ability to carry
> around an extra pointer-sized object to the simple_list implementation, in
> order to allow the filter clause to be associated to the matching oids of
> the table pattern.  It seemed the best way to augment the existing
> simple_list implementation, but change as little as possible elsewhere in
> the codebase.  (Note that SimpleOidList is actually only used by pg_dump).
>
> Feel free to review and propose any amendments.
>
>
Why not simply use \copy (select * from largetable where created_at >=
'2018-05-01') to stdout? That is what I’ve always done when I need
something like this and have not found it particularly bothersome but
rather quite powerful. And here you have tons of flexibility because you
can do joins and whatever else.

FWIW. Thanks,
Jeremy


Re: Some pgq table rewrite incompatibility with logical decoding?

2018-09-01 Thread Jeremy Finzel
>
>
Interesting.
>
> So you were running 9.6.9 before, it triggered the issue (and was not
> able to recover). You took a filesystem snapshot, started a 9.6.10 on
> the snapshot, and it recovered without hitting the issue?
>

I am resposting this to the list and not only to Tomas. Tomas, I can’t
promise just yet to delve into this because given the patch fixes the issue
it’s obviously much lower priority for our team. Are you hoping for me to
confirm the exact scenario in which the 9.6.10 patch fixes the bug?

Actually, there were more things changed than that so I'm not positive it
was the last patch:

BEFORE:
Provider - 9.6.8-1.pgdg16.04+1, pglogical 2.1.1-1.xenial+1
Subscriber - 9.6.9-2.pgdg16.04+1, 2.1.1-1.xenial+1

AFTER:
Provider - 9.6.10-1.pgdg16.04+1, pglogical 2.2.0-3.xenial+1
Subscriber - 9.6.10-1.pgdg16.04+1, pglogical 2.2.0-3.xenial+1


> I quickly went through the commits in 9.6 branch between 9.6.9 and
> 9.6.10, looking for stuff that might be related, and these three commits
> seem possibly related (usually because of invalidations, vacuum, ...):
>
>   6a46aba1cd6dd7c5af5d52111a8157808cbc5e10
>   Fix bugs in vacuum of shared rels, by keeping their relcache entries
>   current.
>
>   da10d6a8a94eec016fa072d007bced9159a28d39
>   Fix "base" snapshot handling in logical decoding
>
>   0a60a291c9a5b8ecdf44cbbfecc4504e3c21ef49
>   Add table relcache invalidation to index builds.
>
> But it's hard to say if/which of those commits did the trick, without
> more information.
>

Let me know if that info gives you any more insight - actually 2 point
version jumps for provider, 1 for subscriber.

Thanks,
Jeremy


Re: Some pgq table rewrite incompatibility with logical decoding?

2018-08-28 Thread Jeremy Finzel
>
> We have hit this error again, and we plan to snapshot the database as to
> be able to do whatever troubleshooting we can.
>

I am happy to report that we were able to get replication working again by
running snapshots of the systems in question on servers running the latest
point release 9.6.10, and replication simply works and skips over these
previously erroring relfilenodes.  So whatever fixes were made in this
point release to logical decoding seems to have fixed the issue.

Thanks,
Jeremy


Re: Some pgq table rewrite incompatibility with logical decoding?

2018-08-28 Thread Jeremy Finzel
>
> Jeremy, are you able to reproduce the issue locally, using pgq? That would
>> be very valuable.
>
>
Tomas et al:

We have hit this error again, and we plan to snapshot the database as to be
able to do whatever troubleshooting we can.  If someone could provide me
guidance as to what exactly you would like me to do, please let me know.  I
am able to provide an xlog dump and also debugging information upon request.

This is actually a different database system that also uses skytools, and
the exact same table (pgq.event_58_1) is again the cause of the relfilenode
error.  I did a point-in-time recovery to a point after this relfilenode
appears using pg_xlogdump, and verified this was the table that appeared,
then disappeared.

Here is the exact issue again I am having:

Provider logs:
2018-08-28 08:29:30.334
CDT,"foo_logical","foo_prod",70236,"0.0.0.0:48314",5b854e3a.1125c,2,"authentication",2018-08-28
08:29:30 CDT,289/78900643,0,LOG,0,"replication connection authorized:
user=foo_logical SSL enabled (protocol=TLSv1.2,
cipher=ECDHE-RSA-AES256-GCM-SHA384, compression=off)",""
2018-08-28 08:29:30.366
CDT,"foo_logical","foo_prod",70236,"0.0.0.0:48314",5b854e3a.1125c,3,"idle",2018-08-28
08:29:30 CDT,289/0,0,LOG,0,"starting logical decoding for slot
""pgl_foo_prod_providerb97b25d_foo336ddc1""","streaming transactions
committing after C0A5/EEBBD4E8, reading WAL from
C0A5/EC723AC8""bar_foo_foo_a"
2018-08-28 08:29:30.366
CDT,"foo_logical","foo_prod",70236,"0.0.0.0:48314",5b854e3a.1125c,4,"idle",2018-08-28
08:29:30 CDT,289/0,0,LOG,0,"logical decoding found consistent point at
C0A5/EC723AC8","There are no running transactions.""bar_foo_foo_a"
2018-08-28 08:29:30.448
CDT,"foo_logical","foo_prod",70236,"0.0.0.0:48314",5b854e3a.1125c,5,"idle",2018-08-28
08:29:30 CDT,289/0,0,ERROR,XX000,"could not map filenode
""base/16418/2800559918"" to relation OID","bar_foo_foo_a"
2018-08-28 08:29:30.463
CDT,"foo_logical","foo_prod",70236,"0.0.0.0:48314",5b854e3a.1125c,6,"idle",2018-08-28
08:29:30 CDT,289/0,0,LOG,08006,"could not receive data from client:
Connection reset by peer","bar_foo_foo_a"
2018-08-28 08:29:30.463
CDT,"foo_logical","foo_prod",70236,"0.0.0.0:48314",5b854e3a.1125c,7,"idle",2018-08-28
08:29:30 CDT,,0,LOG,0,"disconnection: session time: 0:00:00.135
user=foo_logical database=foo_prod host=0.0.0.0
port=48314","bar_foo_foo_a"


Subscriber logs:
2018-08-28 08:32:10.040 CDT,,,71810,,5b854eda.11882,1,,2018-08-28 08:32:10
CDT,7/0,0,LOG,0,"starting apply for subscription
bar_foo_foo_a","pglogical apply 16418:862837778"
2018-08-28 08:32:10.175 CDT,,,71810,,5b854eda.11882,2,,2018-08-28 08:32:10
CDT,7/242099,123886525,ERROR,XX000,"data stream ended","pglogical
apply 16418:862837778"
2018-08-28 08:32:10.175 CDT,,,71810,,5b854eda.11882,3,,2018-08-28 08:32:10
CDT,7/0,0,LOG,0,"apply worker [71810] at slot 1 generation 366 exiting
with error","pglogical apply 16418:862837778"
2018-08-28 08:32:10.179 CDT,,,27173,,5b61d336.6a25,373,,2018-08-01 10:35:18
CDT,,0,LOG,0,"worker process: pglogical apply 16418:862837778 (PID
71810) exited with exit code 1",""


Thanks,
Jeremy


Re: Repeatable Read Isolation in SQL running via background worker

2018-08-16 Thread Jeremy Finzel
On Tue, Aug 14, 2018 at 11:18 AM, Robert Haas  wrote:

> On Mon, Aug 13, 2018 at 10:52 AM, Jeremy Finzel  wrote:
> > On Thu, Aug 9, 2018 at 4:34 PM, Jeremy Finzel  wrote:
> >> I am using worker_spi as a model to run a SQL statement inside a
> >> background worker.  From my browsing of the Postgres library, I believe
> that
> >> if I want repeatable read isolation level, the proper way for me to
> attain
> >> this is to add this line after StartTransactionCommand() in
> worker_spi_main:
> >>
> >> XactIsoLevel = XACT_REPEATABLE_READ;
>
> It's usually a good idea to only change GUCs through the GUC machinery
> i.e. use SetConfigOption().
>
> Are you using StartTransactionCommand() and CommitTransactionCommand()
> to manage transaction boundaries?  If not, maybe you should.
>

Many thanks for the reply.  Yes, I am using StartTransactionCommand and
Commit just like in worker_spi.c.  Here is the relevant section of code:

SetCurrentStatementStartTimestamp();
StartTransactionCommand();
XactIsoLevel = XACT_REPEATABLE_READ;
SPI_connect();
PushActiveSnapshot(GetTransactionSnapshot());
pgstat_report_activity(STATE_RUNNING, buf.data);

/* We can now execute queries via SPI */
SPI_execute(buf.data, false, 0);

/*
* And finish our transaction.
*/
SPI_finish();
PopActiveSnapshot();
CommitTransactionCommand();

So if you are saying it would be better to use SetConfigOption() there I
will look into that.  Thanks!
Jeremy


Re: Some pgq table rewrite incompatibility with logical decoding?

2018-08-13 Thread Jeremy Finzel
>
> I don't think that's true, for two reasons.
>
> Firstly, I don't think pgq updates catalogs directly, it simply truncates
> the queue tables when rotating them (which updates the relfilenode in
> pg_class, of course).
>
> Secondly, we're occasionally seeing this on systems that do not use pgq,
> but that do VACUUM FULL on custom "queue" tables. The symptoms are exactly
> the same ("ERROR: could not map filenode"). It's pretty damn rare and we
> don't have direct access to the systems, so investigation is difficult :-(
> Our current hypothesis is that it's somewhat related to subtransactions
> (because of triggers with exception blocks).
>
> Jeremy, are you able to reproduce the issue locally, using pgq? That would
> be very valuable.


We have tried but have been unable to reproduce it.  If we do encounter it
again, we will plan on reporting back and seeing if we can do some deep
debugging.

Thanks,
Jeremy


Re: Repeatable Read Isolation in SQL running via background worker

2018-08-13 Thread Jeremy Finzel
On Thu, Aug 9, 2018 at 4:34 PM, Jeremy Finzel  wrote:

> I am using worker_spi as a model to run a SQL statement inside a
> background worker.  From my browsing of the Postgres library, I believe
> that if I want repeatable read isolation level, the proper way for me to
> attain this is to add this line after StartTransactionCommand()
> in worker_spi_main:
>
> XactIsoLevel = XACT_REPEATABLE_READ;
>
> Or - am I mistaken?  Does PushActiveSnapshot already ensure I will get the
> same snapshot of the data within this transaction?
>
> Can anyone help me if this is accurate or if there are any other gotchas I
> should be aware of?
>
> The SQL statement will be run every minute for example, and each time with
> this isolation level.  At least, that is my goal.
>
> Any help is much appreciated.
>
> Thanks,
> Jeremy
>

It seems to be working.  If anyone could provide any feedback though I
would be very appreciative.


Repeatable Read Isolation in SQL running via background worker

2018-08-09 Thread Jeremy Finzel
I am using worker_spi as a model to run a SQL statement inside a background
worker.  From my browsing of the Postgres library, I believe that if I want
repeatable read isolation level, the proper way for me to attain this is to
add this line after StartTransactionCommand() in worker_spi_main:

XactIsoLevel = XACT_REPEATABLE_READ;

Or - am I mistaken?  Does PushActiveSnapshot already ensure I will get the
same snapshot of the data within this transaction?

Can anyone help me if this is accurate or if there are any other gotchas I
should be aware of?

The SQL statement will be run every minute for example, and each time with
this isolation level.  At least, that is my goal.

Any help is much appreciated.

Thanks,
Jeremy


Re: Auditing via logical decoding

2018-07-27 Thread Jeremy Finzel
On Fri, Jul 27, 2018 at 5:41 AM Philip Scott 
wrote:

> Hi Postgres Hackers,
>
> We have been using our own trigger-based audit system at my firm
> successfully for some years, but the performance penalty is starting to
> grate a bit and so I have been tasked with seeing if we can make use of
> the new logical decoding functions to achieve the same thing. I thought
> that someone must already have written something that would satisfy our
> use-case but my internet searches have come up short so far so I am
> considering writing a logical decoding plugin to do what we want.
>
> I thought I would run the idea past you all here just in case my plan is
> crazy; I’ve browsed around the postgres source code a bit before but
> I’ve never really gotten my hands dirty and am a little bit nervous
> about putting my own C code into the heart of our DBMS so if this comes
> to anything I would like to offer my code up for review and/or possible
> inclusion as a contributed module.
>
> A quick summary of requirements:
>
> We want to log (to a separate, remote database)
>- One row for every transaction that changes the state of the
> database.
>  We call this table ‘audit_entry’ and contains the xid, transaction
> timestamp, username, client hostname, and application name of the
> session that caused the change.
>- One row for each change made by each transaction which records the
> state of the tuple before the change.
>  We call this table ‘audit_detail’ and contains xid, statement
> timestamp, table name & schema, event_type, primary_key (hstore),
> old_row (hstore), and the text of the query that was responsible for the
> change.
>
> A lot of that information is available already by listening to the
> pgoutput decoding, and my first thought was that I could just write a
> receiver for that. However, application name, username, client hostname
> and current_query() are not available. This is understandable as they
> aren’t useful for logical replication.
>
> I was about to give up, when I discovered pg_logical_emit_message.
>
> My current thoughts are to:
>- Write this extra data into a logical message while the transaction
> is still in progess
>  Either with a deferred trigger per table or, perhaps better
> Find some global commit-time (or xid-assigment time) hook emit it
> there
>
>- Then get the information out of the database:
>  Either modify the existing pgoutput plugin & protocol to forward
> such messages in its stream,
>  Or write a dedicated ‘audit’ decoding plugin with its own protocol
>
>- Then get the information into the ‘auditing’ database:
>  Either with some standalone process that connects to both, consumes
> the output created above, translates it to SQL to run in the auditing
> DB.
>  Figure out how to create a proper postgres background process to do
> it, in a similar fashion to the logical replication worker
>
> Any input you folks have would be very much appreciated.
>
> Kinds Regards,
>
> Philip
>
> PS: If there is someone out there who is willing & able to build this
> for less than my company will have to pay me to do it, please drop me a
> line ☺


All I can say is +1 this would be an awesome feature to have and I hope to
see it someday.


Using test_ddl_deparse as an extension

2018-07-25 Thread Jeremy Finzel
I am interested in using the exact functionality in test_ddl_deparse which
provides the sub command types for alter table statements.  I would prefer
to leverage this code which has already been vetted by the community, and
also not just duplicate this code myself, or package it up myself.

It would be nice to just use the extension.  But I'm interested to hear
what recommendations could be given for my use case (for that matter - what
is the recommendation to anyone wanting to use the exact functionality
provided in a postgres test module).

My specific intention is for sending DDL via logical replication, and
ignoring, for example, enable/disable trigger statements when I am using
selective replication.

Many thanks!
Jeremy


Re: Early WIP/PoC for inlining CTEs

2018-07-24 Thread Jeremy Finzel
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth 
wrote:

> About a year ago I was briefly in discussion/collaboration with Adam Sah
> regarding the topic of inlining CTEs into the query rather than treating
> them as optimization barriers. We didn't take it very far (he sent me
> some stuff, I wrote some stuff and sent it back, things kind of got
> dropped at that point); but there's been some recent discussion of this
> and some people have expressed an interest in seeing the code.
>
> So I'm posting the parts that I wrote for the benefit of anyone wanting
> to pick up the issue again. The assumption of this code is that some
> form of syntax would exist to mark materialized CTEs and set the
> "ctematerialized" flag.
>
> I haven't rebased this or tested it since last year; this patch is
> against b81eba6a65.
>
> Posted for discussion, further development, criticism, whatever; feel
> free to include this (with credit) in any relevant patch. Consider this
> released under the PG license.
>
> --
> Andrew (irc:RhodiumToad)
>
> In our environment we often want this to be a fence.  For example it can
be used to only have smaller numbers of joins in each cte and not hit the
join collapse limit, or when we really know more about the subquery than
the optimizer and have something really specific there .  So in general I
would not want the default functionality to change all of the queries we
have already written with this in mind. I do however like the idea of this
feature being an option, but I would question whether it perhaps worked the
other way around where you have to mark a CTE as not being a fence.

Curious what other RDBMSs do here?

Thanks,
Jeremy


Re: Background worker/idle sessions and caching

2018-07-18 Thread Jeremy Finzel
On Wed, Jul 18, 2018 at 3:05 PM, Tom Lane  wrote:

> Jeremy Finzel  writes:
> > I have a background worker running SQL functions, and I believe I have
> > noticed that when I do things like change function definitions, or even
> add
> > tables, the background worker does not pick up the schema changes until I
> > restart the worker.
>
> Maybe you need some AcceptInvalidationMessages() at appropriate points
> in the worker?  ProcessCatchupInterrupts() might be relevant as well,
> though if you're worried about this, you probably don't want to ever
> be so far behind as to get triggered by that.
>

My module is based squarely on worker_spi.c with some very minor
modifications.  I definitely don't see any AcceptInvalidationMessages() or
ProcessCatchupInterrupts() which would run between successive
executions of SPI_execute
of the SQL that does the delta load.


>
> There might well be a system structural bug here: I'm not sure whether
> bg workers participate in shared-inval signaling at all, or whether they
> can opt in or out of that.  But if they do or can, then a bg worker that
> isn't holding up its end of things by processing catchup interrupts can
> break the entire system's processing of catchups, because of the
> daisy-chain behavior that we put in awhile back to prevent all backends
> from firing catchup processing at the same time.  There's an assumption
> that all processes that are eligible to receive catchup signals will
> play nice and pass the signal on reasonably quickly.
>
> regards, tom lane


My use case is similar to the example of worker_spi.  A plpgsql function
runs every 1 minute and processes records in audit tables in order to
update fact tables with records that have changed.  I noticed for example
renaming a column in the function was not picked up, and I had to restart
the workers to reset the cache.

Thanks,
Jeremy


Background worker/idle sessions and caching

2018-07-18 Thread Jeremy Finzel
I have a background worker running SQL functions, and I believe I have
noticed that when I do things like change function definitions, or even add
tables, the background worker does not pick up the schema changes until I
restart the worker.

Is this expected behavior?  If I use background workers in this way, can I
send a signal to it to reload things like function defs, or do I need to
restart the workers?

Thanks!
Jeremy


Re: Logical decoding from promoted standby with same replication slot

2018-07-16 Thread Jeremy Finzel
On Fri, Jul 13, 2018 at 2:30 PM, Jeremy Finzel  wrote:

> Hello -
>
> We are working on several DR scenarios with logical decoding.  Although we
> are using pglogical the question we have I think is generally applicable to
> logical replication.
>
> Say we have need to drop a logical replication slot for some emergency
> reason on the master, but we don't want to lose the data permanently.  We
> can make a point-in-time-recovery snapshot of the master to use in order to
> recover the lost data in the slot we are about to drop.  Then we drop the
> slot on master.
>
> We can then point our logical subscription to pull from the snapshot to
> get the lost data, once we promote it.
>
> The question is that after promotion, logical decoding is looking for a
> timeline 2 file whereas the file is still at timeline 1.
>
> The WAL file is 000108FD003C, for example.  After promotion,
> it is still 000108FD003C in pg_wal.  But logical decoding says
> ERROR: segment 000208FD003C has already been removed (it is
> looking for a timeline 2 WAL file).  Simply renaming the file actually
> allows us to stream from the replication slot accurately and recover the
> data.
>
> But all of this begs the question of an easier way to do this - why
> doesn't logical decoding know to look for a timeline 1 file?  It is really
> helpful to have this ability to easily recover logical replicated data from
> a snapshot of a replication slot, in case of disaster.
>
> All thoughts very welcome!
>
> Thanks,
> Jeremy
>

I'd like to bump this question with some elaboration on my original
question: is it possible to do a *controlled* failover reliably with
logical decoding, assuming there are unconsumed changes in the replication
slot that client still needs?

It is rather easy to do a controlled failover if we can verify there are no
unconsumed changes in the slot before failover.  Then, we just recreate the
slot on the promoted standby while clients are locked out, and we have not
missed any data changes.

I am trying to figure out if the problem of following timelines, as per
this wiki for example: https://wiki.postgresql.org/wiki/Failover_slots, can
be worked around in a controlled scenario.  One additional part of this is
that after failover I have 2 WAL files with the same walfile name but on
differing timelines, and the promoted standby is only going to decode from
the latter.  Does that mean I am likely to lose data?

Part of the reason I ask is because in testing, I have NOT lost data in
doing a controlled failover as described above (i.e. with unconsumed
changes in the slot that I need to replay on promoted standby).  I am
trying to figure out if I've gotten lucky or if this method is actually
reliable.  That is, renaming the WAL files to bump the timeline, since
these WAL files are simply identical to the ones that were played on the
master, and thus ought to show the same logical decoding information to be
consumed.


Thank you!
Jeremy


Logical decoding from promoted standby with same replication slot

2018-07-13 Thread Jeremy Finzel
Hello -

We are working on several DR scenarios with logical decoding.  Although we
are using pglogical the question we have I think is generally applicable to
logical replication.

Say we have need to drop a logical replication slot for some emergency
reason on the master, but we don't want to lose the data permanently.  We
can make a point-in-time-recovery snapshot of the master to use in order to
recover the lost data in the slot we are about to drop.  Then we drop the
slot on master.

We can then point our logical subscription to pull from the snapshot to get
the lost data, once we promote it.

The question is that after promotion, logical decoding is looking for a
timeline 2 file whereas the file is still at timeline 1.

The WAL file is 000108FD003C, for example.  After promotion, it
is still 000108FD003C in pg_wal.  But logical decoding says
ERROR: segment 000208FD003C has already been removed (it is
looking for a timeline 2 WAL file).  Simply renaming the file actually
allows us to stream from the replication slot accurately and recover the
data.

But all of this begs the question of an easier way to do this - why doesn't
logical decoding know to look for a timeline 1 file?  It is really helpful
to have this ability to easily recover logical replicated data from a
snapshot of a replication slot, in case of disaster.

All thoughts very welcome!

Thanks,
Jeremy


Some pgq table rewrite incompatibility with logical decoding?

2018-06-25 Thread Jeremy Finzel
I am hoping someone here can shed some light on this issue - I apologize if
this isn't the right place to ask this but I'm almost some of you all were
involving in pgq's dev and might be able to answer this.

We are actually running 2 replication technologies on a few of our dbs,
skytools and pglogical.  Although we are moving towards only using logical
decoding-based replication, right now we have both for different purposes.

There seems to be a table rewrite happening on table pgq.event_58_1 that
has happened twice, and it ends up in the decoding stream, resulting in the
following error:

ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation
OID"

In retracing what happened, we discovered that this relfilenode was
rewritten.  But somehow, it is ending up in the logical decoding stream as
is "undecodable".  This is pretty disastrous because the only way to fix it
really is to advance the replication slot and lose data.

The only obvious table rewrite I can find in the pgq codebase is a truncate
in pgq.maint_rotate_tables.sql.  But there isn't anything surprising
there.  If anyone has any ideas as to what might cause this so that we
could somehow mitigate the possibility of this happening again until we
move off pgq, that would be much appreciated.

Thanks,
Jeremy


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Jeremy Finzel
On Tue, Jun 19, 2018 at 8:26 AM Matheus de Oliveira <
matioli.math...@gmail.com> wrote:

> Hello friends.
>
> On Tue, Jun 12, 2018 at 3:31 PM, Andres Freund  wrote:
>
>>
>> On 2018-06-11 17:39:14 -0700, Andres Freund wrote:
>> > I plan to go over the change again tomorrow, and then push. Unless
>> > somebody has comments before then, obviously.
>>
>> Done.
>>
>>
> Sorry to bother about this, but do you have any plan to do the minor
> release before planned due to this bug?
>
> There seem to have too many users affected by this. And worst is that many
> users may not have even noticed they have the problem, which can cause
> `age(datfrozenxid)` to keep increasing until reachs 2.1 billions and the
> system goes down.
>
> In my case, I have a server that its `age(datfrozenxid)` is already at 1.9
> billions, and I expect it to reach 2.1 billions in about 14 days.
> Fortunately, I have monitoring system over `age(datfrozenxid)`, that is why
> I found the issue in one of my servers.
>
> I'm pondering what is the best option to avoid a forced shutdown of this
> server:
> - should I just wait for a release (if it is soon, I would be fine)?
> - build PG from the git version by myself?
> - or is there a safer workaround to the problem? (not clear to me if
> deleting the `global/pg_internal.init` file is really the way to go, and
> the details, is it safe? Should I stop the server, delete, start?)
>
> Best regards,
> --
> Matheus de Oliveira
>
>
> Restarting the database has fixed the error on these pg_catalog tables,
allowing us to vacuum them and avoid wraparound.

We first noticed a restart fixed the issue because SAN snapshots did not
have the error. The only difference really being shared memory and nothing
disk-level.

Jeremy


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-19 Thread Jeremy Finzel
On Fri, May 25, 2018 at 3:37 PM, Andres Freund  wrote:

> Hi,
>
> Moving discussion to -hackers.  Tom, I think you worked most with this
> code, your input would be appreciated.
>
> Original discussion is around:
> http://archives.postgresql.org/message-id/20180524211311.
> tnswfnjwnii54htx%40alvherre.pgsql
>
> On 2018-05-24 17:13:11 -0400, Alvaro Herrera wrote:
> > On 2018-May-24, Andres Freund wrote:
> > > Then there's also:
> > > http://archives.postgresql.org/message-id/1527193504642.
> 36340%40amazon.com
> >
> > ah, so deleting the relcache file makes the problem to go away?  That's
> > definitely pretty strange.  I see no reason for the value in relcache to
> > become out of step with the catalogued value in the same database ... I
> > don't think we transmit in any way values of one database to another.
>
> I can reproduce the issue. As far as I can tell we just don't ever
> actually update nailed relcache entries in the normal course, leaving
> the "physical address" aside.  VACUUM will, via
> vac_update_relstats() -> heap_inplace_update() ->
> CacheInvalidateHeapTuple(),
> send out an invalidation. But invalidation, in my case another session,
> will essentially ignore most of that due to:
>
> static void
> RelationClearRelation(Relation relation, bool rebuild)
> ...
> /*
>  * Never, never ever blow away a nailed-in system relation,
> because we'd
>  * be unable to recover.  However, we must redo
> RelationInitPhysicalAddr
>  * in case it is a mapped relation whose mapping changed.
>  *
>  * If it's a nailed-but-not-mapped index, then we need to re-read
> the
>  * pg_class row to see if its relfilenode changed. We do that
> immediately
>  * if we're inside a valid transaction and the relation is open
> (not
>  * counting the nailed refcount).  Otherwise just mark the entry as
>  * possibly invalid, and it'll be fixed when next opened.
>  */
> if (relation->rd_isnailed)
> {
> RelationInitPhysicalAddr(relation);
>
> if (relation->rd_rel->relkind == RELKIND_INDEX ||
> relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_INDEX)
> {
> relation->rd_isvalid = false;   /* needs to be
> revalidated */
> if (relation->rd_refcnt > 1 &&
> IsTransactionState())
> RelationReloadIndexInfo(relation);
> }
> return;
> }
>
> Which basically means that once running we'll never update the relcache
> data for nailed entries.  That's unproblematic for most relcache fields,
> but not for things like RelationData->rd_rel->relfrozenxid / relminmxid.
>
> This'll e.g. lead to lazy_vacuum_rel() wrongly not using aggressive
> vacuums despite being required. And it'll lead, triggering this thread,
> to wrong errors being raised during vacuum because relfrozenxid just is
> some random value from the past.  I suspect this might also be
> co-responsible for a bunch of planning issues for queries involving the
> catalog, because the planner will use wrong relcache data until the next
> time the init file is thrown away?
>
> This looks like a very longstanding bug to me.  I'm not yet quite sure
> what the best way to deal with this is.  I suspect we might get away
> with just looking up a new version of the pg_class tuple and copying
> rd_rel over?
>
> Greetings,
>
> Andres Freund
>

I have a question related to this - and specifically, preventing the error
until we have a patch :).  We are encountering this error every few weeks
on one very high transaction db, and have to restart to fix it.

If I read you correctly, the cache may never be invalidated for these
catalogs even if I manually VACUUM them?  I was thinking if I routinely run
VACUUM FREEZE on these tables in every database I might avoid the issue.
But given the cause of the issue, would that just make no difference and I
will still hit the error eventually?

Thanks,
Jeremy


Including SQL files in extension scripts

2018-04-10 Thread Jeremy Finzel
In writing extension update scripts, I find it to be really difficult to
grok diffs for example in changed view or function definitions when a new
extension script has to include the whole definition in a new file.  I want
to rather use separate files for these objects, then use something like
psql's \i to include the definitions so that version control (and
management of these files) becomes easier, but this does not appear to be
supported.  Does anyone have any recommendations here?

Currently, I often simply opt to create a script that builds out the
extension SQL files from separated sql files so that I can more easily
maintain them.  But I was hoping there is a better way.  Any ideas would be
much appreciated.

Thanks,
Jeremy


Re: Passing current_database to BackgroundWorkerInitializeConnection

2018-04-02 Thread Jeremy Finzel
On Mon, Apr 2, 2018 at 2:27 PM, Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2018-04-02 14:24:53 -0500, Jeremy Finzel wrote:
> > Thank you, this makes sense.  However, how can this be done since I can
> > only pass one argument to bgw_main?  Is there any way to do this without
> > having to store the value in shared memory?
>
> No (I mean you can store it in the filesystem or such as well, but
> ...). Pretty fundamentally sharing data between concurrently running
> processes needs a medium to share the data over. The bgw infrastructure
> allocates just enough so you can put an index to it into
> shmem. Allocating more would be wasteful and/or not enough for some
> users.
>
> Greetings,
>
> Andres Freund
>

Hmmm... not sure if I follow.  My goal is to run a SQL statement every 10
seconds (or what value is chosen) in a particular database, using a
background worker.  Those are the two arguments.  Am I missing some way to
implement this apart from passing those 2 arguments into the launcher
function?  Is the way to do this properly then to allocate shared memory
for it, as opposed to trying to pass args into the main function?

Thanks,
Jeremy


Re: Passing current_database to BackgroundWorkerInitializeConnection

2018-04-02 Thread Jeremy Finzel
On Fri, Mar 30, 2018 at 5:37 PM, Andres Freund <and...@anarazel.de> wrote:

>
>
> On March 30, 2018 3:16:31 PM PDT, Jeremy Finzel <finz...@gmail.com> wrote:
> >> What do you mean with "current database"? Before you
> >> BackgroundWorkerInitializeConnection() there is no such thing?
> >
> >
> >My module is based directly off the worker_spi example. The worker is
> >dynamically launched via SQL command. But in the worker_spi example,
> >the
> >database postgres is just hardcoded as the database in which to start
> >the
> >background worker process. Instead, I want to start it in the database
> >in
> >which I run the SQL command.
>
> The started worker isn't associated with the original database. You can
> pass the database oid as an argument to the launched bgworker.
>
>
Thank you, this makes sense.  However, how can this be done since I can
only pass one argument to bgw_main?  Is there any way to do this without
having to store the value in shared memory?  I was going to try passing an
array instead of an int, but I'm not liking that much.  I am trying to pass
naptime and database_name (or oid).

Thanks,
Jeremy


Re: Passing current_database to BackgroundWorkerInitializeConnection

2018-03-30 Thread Jeremy Finzel
> What do you mean with "current database"? Before you
> BackgroundWorkerInitializeConnection() there is no such thing?


My module is based directly off the worker_spi example. The worker is
dynamically launched via SQL command. But in the worker_spi example, the
database postgres is just hardcoded as the database in which to start the
background worker process. Instead, I want to start it in the database in
which I run the SQL command.

Thank you!

Jeremy

>


Passing current_database to BackgroundWorkerInitializeConnection

2018-03-30 Thread Jeremy Finzel
I am having trouble figuring out the right way to do this, clearly missing
something obvious.  I am simply trying to pass the current database
to BackgroundWorkerInitializeConnection, but MyDatabaseId is showing as 0,
and I am getting this error in the running of function get_database_name
here at the top:

ERROR:  invalid cache ID: 21

char   *database_name = get_database_name(MyDatabaseId);

/*
...
*/

/* Connect to our database */
BackgroundWorkerInitializeConnection(database_name, NULL);


Any direction would be much appreciated.  I have tried to look at example
uses of MyDatabaseId and not sure what I'm missing here.

Thanks,
Jeremy


Re: Feature Request - DDL deployment with logical replication

2018-03-30 Thread Jeremy Finzel
On Fri, Mar 30, 2018 at 2:26 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
> Our team in PostgresPro is also very interested in this discussion,
> because we are using logical decoding in multimaster.
> Right now in multimaster DDLs are replicated in this same way as in
> pglogical, using "logical messages" with correspondent DDL statements.
> There are a couple of problems related with this approach, for example mix
> of DDL with DML (create table as...) which we have addressed in multimaster.
>

IMO, I wouldn't even allow CREATE TABLE AS especially as an initial
feature.  In our environment, I have rarely if even seen an application
migration that uses CREATE TABLE AS, and it's an acceptable compromise for
us to simply disallow it.  The way I have written pgl_ddl_deploy is to
disallow CREATE TABLE AS from being replicated.  IMO that's a very small
price to pay.

My whole point is that in most architectures, DBAs decide to deploy the
same SQL on providers and subscribers.  Yes it isn't perfect, but IMO, it
is very helpful to try to automate that idea, as opposed to trying to
actually replicate DDL at the low level.  The latter is better, yes, but
seems to have proven extremely difficult.  Hence, why you see the advent of
functions to pipe DDL through the replication stream.

Thanks,
Jeremy


Re: Feature Request - DDL deployment with logical replication

2018-03-30 Thread Jeremy Finzel
On Fri, Mar 30, 2018 at 10:16 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/29/18 13:21, Jeremy Finzel wrote:
> > Although we are thrilled with some of the features already in logical
> > replication, this missing feature is the #1 reason that we don't plan to
> > take a serious look at built-in logical replication even for pg11,
> > because we have been able to use pglogical with our own extension
> > pgl_ddl_deploy in order to broadly deploy logical replication without
> > serious overhauls to our SDLC process, having schema changes managed
> > well.  We really want a mechanism to put through DDL changes at the same
> > transactional point on the subscribers as we do on the publishers, which
> > also answers any complexities around deploying master-first or
> > slave-first in some interesting cases.
> >
> > Is there any particular vision for how the community might address this
> > need in the future?
>
> I think nobody has completely figured this out yet.  Whatever is in
> pglogical and bdr and similar external projects are the best current
> compromises.  But they have lots of problems, so I don't know if anyone
> is ready to propose something for in core yet.
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

IMO, it would be an acceptable and good first step to provide a function
that will replicate a SQL command through the replication stream at the
right point, even if there is still no automation around it.


Feature Request - DDL deployment with logical replication

2018-03-29 Thread Jeremy Finzel
Hello!

I have not seen much discussion about what the plans are for being able to
manage schema changes when using logical replication.  In our own
infrastructure, mechanisms that have been provided to manage DDL statements
at the same transactional point as they happen on the master have been
immensely useful to us, such as replicate_ddl_command from pglogical.

Although we are thrilled with some of the features already in logical
replication, this missing feature is the #1 reason that we don't plan to
take a serious look at built-in logical replication even for pg11, because
we have been able to use pglogical with our own extension pgl_ddl_deploy in
order to broadly deploy logical replication without serious overhauls to
our SDLC process, having schema changes managed well.  We really want a
mechanism to put through DDL changes at the same transactional point on the
subscribers as we do on the publishers, which also answers any complexities
around deploying master-first or slave-first in some interesting cases.

Is there any particular vision for how the community might address this
need in the future?

Thank you!
Jeremy


Re: worker_spi.naptime in worker_spi example

2018-03-15 Thread Jeremy Finzel
On Thu, Mar 15, 2018 at 3:54 PM, Chapman Flack <c...@anastigmatix.net>
wrote:

> On 03/15/2018 04:19 PM, Jeremy Finzel wrote:
>
> > Thank you.  I did see worker_spi_naptime definition, but I wanted to
> > pass worker_spi_naptime
> > as an argument instead into worker_spi_main so that I can control via SQL
> > interface how frequently the worker SQL executes.  Is this possible?
>
> Are you writing a background worker that is started by the postmaster
> at system startup, or one that you dynamically launch from a regular
> session later?
>
> In the first case, your background worker is not a child of any
> session you directly control with SQL. It is a child of the postmaster.
>
> It has to get the initial value of worker_spi.naptime from the
> configuration sources that are available at the time the postmaster
> starts it (most commonly, the configuration files).
>
> Once it is running, it will only notice a change to worker_spi.naptime
> if it responds to a HUP signal by doing a new ProcessConfigFile(). The
> example code does exactly this, so you can see how it's done.
>
> That means you do have a way to control it "via SQL" if:
>
> - you use ALTER SYSTEM to set the value of worker_spi.naptime, and
> - if you change the value with ALTER SYSTEM and want the new value
>   to take effect while the worker is already running, you do
>   SELECT pg_reload_conf()
>
> In the second, dynamic-launch case (which is also illustrated in
> worker_spi.c, as it contains code for both ways of launching), you can
> see that you end up writing an SQL-callable function (worker_spi_launch
> in the example), which you can define to take whatever arguments you
> want it to.
>
> How to get those arguments to the background worker is the next
> question. You can stash one value in bgw_main_arg (as a Datum),
> and have it passed as the only argument to your main function
> in the worker.
>
> Or, if you want to pass it more information, you can pass up to
> BGW_EXTRALEN bytes of stuff of your choice in bgw_extra, and
> have the worker retrieve it via MyBgworkerEntry.
>
> -Chap
>

This was extremely helpful.  I didn't see how the example is supporting
both dynamic and shared_preload_libraries ways of loading bgworkers.  I
just tried both ways and can see how this works.  Before I couldn't see the
GUC configs in my database, but when I added to shared_preload_libraries I
can now.  Awesome.

To answer your question, I was more looking at dynamic workers.  In our
environment, cluster restarts are expensive, so the dynamic loading of
workers is a super awesome feature of Postgres that we are hoping to use.

Will let you know if I have more questions, but I will look more into what
you suggest.

Thanks,


Re: worker_spi.naptime in worker_spi example

2018-03-15 Thread Jeremy Finzel
On Thu, Mar 15, 2018 at 12:21 PM, Chapman Flack <c...@anastigmatix.net>
wrote:

> On 03/15/2018 10:59 AM, Jeremy Finzel wrote:
>
> > WaitLatch function the same variable worker_spi_naptime without
> referencing
> > the configuration created in _PG_init.  Am I missing something here?
>
> You can see the declaration of that variable up on line 56.
> It's just an int variable static to worker_spi.c.
>
> The code you saw in _PG_init registers that variable to the
> Grand Unified Configuration system (you can see where the address
> of the variable, _spi_naptime, is passed in that registration),
> so the GUC system can handle setting or querying the value from SQL
> (and config files, etc.).
>
> You can read more about the GUC system in
> src/backend/utils/misc/README, src/backend/utils/misc/guc.c, and
> src/include/utils/guc.h.
>
> -Chap
>

Thank you.  I did see worker_spi_naptime definition, but I wanted to
pass worker_spi_naptime
as an argument instead into worker_spi_main so that I can control via SQL
interface how frequently the worker SQL executes.  Is this possible?

Thanks,
Jeremy


worker_spi.naptime in worker_spi example

2018-03-15 Thread Jeremy Finzel
Hello!

I am trying to create a background worker where I can pass in via SQL
interface the naptime.  I'm not sure how to do this.  In the worker_spi
example, _PG_init creates a configuration using the static variable
worker_spi_naptime, but then it appears to call in worker_spi_main in the
WaitLatch function the same variable worker_spi_naptime without referencing
the configuration created in _PG_init.  Am I missing something here?

I could see how to pass in an int for naptime via the SQL interface but I'm
not sure what the function is of this configuration.

Thanks!
Jeremy


Re: using worker_spi as pattern

2018-03-09 Thread Jeremy Finzel
On Fri, Mar 9, 2018 at 12:34 AM, Michael Paquier <mich...@paquier.xyz>
wrote:

> On Thu, Mar 08, 2018 at 11:04:20PM -0600, Jeremy Finzel wrote:
> > Since you mention, can anyone elaborate further on the memory leak danger
> > here?
> >
> > Line 193 in src/test/modules/worker_spi/worker_spi.c read:
> > # Note some memory might be leaked here.
> >
> > Is this any reason *not *to use this pattern in production?
>
> quote_identifier may palloc the result, so the first pstrdup on the top
> to save "schema" and "table" refer to a pointer which may perhaps get
> lost.  Those are just a couple of bytes, so the code complication is not
> worth the cleanup IMO.
> --
> Michael
>

Makes sense, thank you.


using worker_spi as pattern

2018-03-08 Thread Jeremy Finzel
Hello - I have compiled and installed the extension worker_spi.  I also
launched the process via SELECT worker_spi_launch(1);

I see this in pg_stat_activity:
WITH deleted AS (DELETE FROM schema1.counted WHERE type = 'delta' RETURNING
value), total AS (SELECT coalesce(sum(value), 0) as sum FROM deleted)
UPDATE schema1.counted SET value = counted.value + total.sum FROM total
WHERE type = 'total' RETURNING counted.value

However, I'm not sure what I am supposed to do next?  The docs at the top
of the module say:

To see it working, insert an initial value
 * with "total" type and some initial value; then insert some other rows
with
 * "delta" type.  Delta rows will be deleted by this worker and their values
 * aggregated into the total.

However, this raises many questions for me:

   - Insert a value into what table?  I see the process referring to an
   object that doesn't exist in my database - schema1.counted
   - What is "total" type?  I don't see any type with this name in the
   database
   - Same question for "delta" type

I am trying to use this extension as a pattern for my own background
worker, but just trying to understand it.

Thanks!
Jeremy


Identifying ALTER TABLE "sub-command"

2018-01-11 Thread Jeremy Finzel
Hello -

I have found that in leveraging the parser code to decode DDL SQL, it is
very easy to get which type of general command is being issued with
CreateCommandTag(parsetree).  However, is there a way (or a starting point)
to identify the sub-command as well i.e. ENABLE TRIGGER, ADD FOREIGN KEY,
etc.?

Any direction is much appreciated.

Thanks,
Jeremy


Re: Deadlock in multiple CIC.

2017-12-27 Thread Jeremy Finzel
>
> Since the purpose of CIC is to build an index with minimal impact on other
> users, I think wanting to use it in concurrent cases might be rather rare.
> In a maintenance window, I wouldn't want to use CIC because it is slower
> and I'd rather just hold the stronger lock and do it fast, and as a hot-fix
> outside a maintenance window I usually wouldn't want to hog the CPU with
> concurrent builds when I could do them sequentially instead.  Also, since
> deadlocks are "expected" errors rather than "should never happen" errors,
> and since the docs don't promise that you can do parallel CIC without
> deadlocks, many people would probably shrug it off (which I initially did)
> rather than report it as a bug.  I was looking into it as an enhancement
> rather than a bug until I discovered that it was already enhanced and then
> undone.
>
>
FWIW, yes I agree it is a rather rare use case.  For me, it's for doing
concurrent index builds on logical replica tables especially after initial
copy with non-indexed tables, where replicated tables may have traffic
coming in constantly.  That means DBA doesn't have to wait for hours for
them to build 1 by 1, and also doesn't have to worry about long locks.

However IIRC, we have also run into deadlocks before when trying to build
multiple indexes in an OLTP system, which may have been due to this issue
as opposed to only trying to index the same table.


Re: Deadlock in multiple CIC.

2017-12-27 Thread Jeremy Finzel
On Tue, Dec 26, 2017 at 11:23 AM, Jeff Janes  wrote:

> On Tue, Dec 26, 2017 at 8:31 AM, Alvaro Herrera 
> wrote:
>
>> Jeff Janes wrote:
>> > c3d09b3bd23f5f6 fixed it so concurrent CIC would not deadlock (or at
>> least
>> > not as reliably as before) by dropping its own snapshot before waiting
>> for
>> > all the other ones to go away.
>> >
>> > With commit 8aa3e47510b969354ea02a, concurrent CREATE INDEX
>> CONCURRENTLY on
>> > different tables in the same database started deadlocking against each
>> > other again quite reliably.
>> >
>> > I think the solution is simply to drop the catalog snapshot as well, as
>> in
>> > the attached.
>>
>> Thanks for the analysis -- it sounds reasonable to me.  However, I'm
>> wondering why you used the *Conditionally() version instead of plain
>> InvalidateCatalogSnapshot().
>
>
> My thinking was that if there was for some reason another snapshot hanging
> around, that dropping the catalog snapshot unconditionally would be a
> correctness bug, while doing it conditionally would just fail to avoid a
> theoretically avoidable deadlock.  So it seemed safer.
>
>
>>   I think they must have the same effect in
>> practice (the assumption being that you can't run CIC in a transaction
>> that may have other snapshots) but the theory seems simpler when calling
>> the other routine: just do away with the snapshot always, period.
>>
>
> That is probably true.  But I never even knew that catalog snapshots
> existed until yesterday, so didn't want to make make assumptions about what
> else might exist, to avoid introducing new bugs similar to the one that
> 8aa3e47510b969354ea02a fixed.
>
>
>>
>> This is back-patchable to 9.4, first branch which has MVCC catalog
>> scans.  It's strange that this has gone undetected for so long.
>
>
> Since the purpose of CIC is to build an index with minimal impact on other
> users, I think wanting to use it in concurrent cases might be rather rare.
> In a maintenance window, I wouldn't want to use CIC because it is slower
> and I'd rather just hold the stronger lock and do it fast, and as a hot-fix
> outside a maintenance window I usually wouldn't want to hog the CPU with
> concurrent builds when I could do them sequentially instead.  Also, since
> deadlocks are "expected" errors rather than "should never happen" errors,
> and since the docs don't promise that you can do parallel CIC without
> deadlocks, many people would probably shrug it off (which I initially did)
> rather than report it as a bug.  I was looking into it as an enhancement
> rather than a bug until I discovered that it was already enhanced and then
> undone.
>
> Cheers,
>
> Jeff
>

I was able to get this compiled, and ran the test before on stock 9.6.6,
then on this patched version.  I indeed reproduced it on 9.6.6, but on the
patched version, it indeed fixes my issue.

Let me know if I can be of further help.

Thanks,
Jeremy


Re: Backfill bgworker Extension?

2017-12-17 Thread Jeremy Finzel
On Sat, Dec 16, 2017 at 8:31 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/15/17 23:50, Jeremy Finzel wrote:
> > The common ground is some column in some table needs to be bulk updated.
> > I may not be explaining well, but in our environment we have done
> > hundreds of these using a generic framework to build a backfill. So I’m
> > not sure what you are questioning about the need? We have had to build a
> > worker to accomplish this because it can’t be done as a sql script alone.
>
> I'm trying to identify the independently useful pieces in your use case.
>  A background worker to backfill large tables is a very specific use
> case.  If instead we had a job/scheduler mechanism and a way to have
> server-side scripts that can control transactions, then that might
> satisfy your requirements as well (I'm not sure), but it would also
> potentially address many other uses.


I follow you now.  Yes, I think it probably would.  I think it would at
least provide a framework on which to build the tool I want.  It would be
great to have a "worker-capable" tool inside postgres than always having to
write external logic to do things like this.

> I’m not sure what you mean by a stored procedure in the background.
> > Since it would not be a single transaction, it doesn’t fit as a stored
> > procedure at least in Postgres when a function is 1 transaction.
>
> In progress: https://commitfest.postgresql.org/16/1360/


Looking forward to this.  I think this will help.  A stored procedure with
subtransactions could have the logic for the backfill in it, but would
still need an external worker that could retry it in case of failure
especially when things like a server restart happens.

Thanks,
Jeremy


Re: Backfill bgworker Extension?

2017-12-15 Thread Jeremy Finzel
On Tue, Dec 12, 2017 at 2:26 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/12/17 13:03, Jeremy Finzel wrote:
> > To be clear, what I mean is batch updating a large set of data in small
> > pieces so as to avoid things like lock contention and replication lags.
> > Sometimes these have a driving table that has the source data to update
> > in a destination table based on a key column, but sometimes it is
> > something like setting just a single specific value for a huge table.
> >
> > I would love instead to have a Postgres extension that uses postgres
> > background workers to accomplish this, especially if it were part of
> > core.  Before I venture into exploring writing something like this as an
> > extension, would this ever be considered something appropriate as an
> > extension in Postgres core?  Would that be appropriate?
>
> I don't see what the common ground between different variants of this
> use case would be.  Aren't you basically just looking to execute a
> use-case-specific stored procedure in the background?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


The common ground is some column in some table needs to be bulk updated. I
may not be explaining well, but in our environment we have done hundreds of
these using a generic framework to build a backfill. So I’m not sure what
you are questioning about the need? We have had to build a worker to
accomplish this because it can’t be done as a sql script alone.

I’m not sure what you mean by a stored procedure in the background. Since
it would not be a single transaction, it doesn’t fit as a stored procedure
at least in Postgres when a function is 1 transaction.

Sorry if I’m misunderstanding.

Thanks,
Jeremy


Backfill bgworker Extension?

2017-12-12 Thread Jeremy Finzel
One of our challenges we have is that our engineers have written frameworks
to backfill data in several different DSLs, and every time they adopt a new
language, they maybe need to write another one.

To be clear, what I mean is batch updating a large set of data in small
pieces so as to avoid things like lock contention and replication lags.
Sometimes these have a driving table that has the source data to update in
a destination table based on a key column, but sometimes it is something
like setting just a single specific value for a huge table.

I would love instead to have a Postgres extension that uses postgres
background workers to accomplish this, especially if it were part of core.
Before I venture into exploring writing something like this as an
extension, would this ever be considered something appropriate as an
extension in Postgres core?  Would that be appropriate?

Thanks,
Jeremy


Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Jeremy Finzel
Someone mentioned about needing to read the documentation of vim to learn
how to exit the program.  I personally think exactly the same applies here,
and am a bit surprised at the depth of discussion over this.

When I first was new to cli programs, the only "standard" way to exit a
program I found across the board was CTRL+D.  I have never even thought
that psql, even though different, was odd in its choice of how to exit the
program.  And their choices for why \q is used in a SQL cli program have
very good reasons as has been discussed.

So what if that stack overflow is the most hit on psql questions?  How many
of you have learned what you need within 5 seconds on google?  I don't
think this is a significant stumbling block for users adopting postgres.
psql is amazing and for anyone who likes cli programs, they will be fine.

FWIW I am +1 in favor of not overcomplicating the special psql commands and
keep to the "\" standard.

On Tue, Dec 12, 2017 at 8:35 AM, Gasper Zejn  wrote:

> On 12. 12. 2017 06:58, Tom Lane wrote:
> > Craig Ringer  writes:
> >> I doubt I've ever written just "exit" or "quit" without indentation. I
> >> think if it requires them to be a bareword with no indentation, strictly
> >> ^(exit|quit)\n when isatty, then that's probably a safe and helpful
> choice.
> > FWIW, I think that the special behavior (whatever it winds up being
> > exactly) ought to trigger on an input line matching
> >
> >   [whitespace]*help[whitespace]*(;[whitespace]*)?
> >
> > and similarly for exit/quit.  I think that novices might have
> internalized
> > enough SQL syntax to think that they need to terminate the command with a
> > semicolon --- in fact, we regularly see examples in which seasoned users
> > think they need to terminate backslash commands with a semicolon, so
> > that's hardly far-fetched.  And we might as well allow as much whitespace
> > as we can, because nobody but Guido Rossum thinks that having whitespace
> > be semantically significant is a good idea.
> >
> >   regards, tom lane
> >
> If tabs are considered whitespace, psql can sometimes treat it as
> semantically significant, since
> "create materializedtest11 as select 1;" will be autocompleted to
> correct syntax if you paste the line into interactive psql session.
>
> I have seen psql error out with invalid syntax when a similar query was
> pasted and psql autocompleted it.
>
> kind regards, Gasper
>
>
>


Re: Testing Extension Upgrade Paths

2017-12-12 Thread Jeremy Finzel
On Mon, Dec 11, 2017 at 7:55 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 12 December 2017 at 07:25, Michael Paquier <michael.paqu...@gmail.com>
> wrote:
>
>> On Tue, Dec 12, 2017 at 2:55 AM, Jeremy Finzel <finz...@gmail.com> wrote:
>> > I understand how to setup a regression suite for a postgres extension,
>> but
>> > what I'm not clear on from the docs is if there is a pattern that
>> exists for
>> > testing not only the latest version of an extension, but also an
>> upgraded
>> > previous version.
>> >
>> > Is there any straightforward way to do this that doesn't involve
>> manually
>> > copying tests?
>>
>> It is perfectly possible to do tests on a specific version and test
>> upgrade paths using CREATE EXTENSION and ALTER EXTENSION which support
>> versioning in a SQL regression script, say:
>> CREATE EXTENSION foo VERSION "0.1";
>> -- Do stuff
>> ALTER EXTENSION foo UPDATE TO "0.2";
>> -- Do other stuff
>>
>
> This works well when you want to test 1.1 binaries with a 1.0 SQL
> extension definition.
>
> It's not effective if you need to test 1.0 binaries+extension, then an
> upgrade to 1.1 binaries and extension. You have no way to install and run
> the 1.0 binaries without going outside pg_regress / TAP and using an
> external test harness/framework/script. I've been bitten by this multiple
> times before. For example, if 1.1 always assigns a value to some column
> that was nullable in 1.0, you're never going to exercise 1.1's handling of
> nulls in that column.
>
> It also doesn't lend its self to complex multi-path upgrades, where for
> example you could upgrade 1.0.0 -> 1.0.1 -> 1.1.0 or  upgrade directly from
> 1.0.0 -> 1.1.0. This rapidly becomes an issue when you release 1.0.0,
> release 1.1.0, then release a maintenance 1.0.1  version. Now you have
> toadd the 1.0.1 -> 1.1.0 upgrade path, but you still have the 1.0.0 ->
> 1.1.0 path.
>
> You can handle that with TAP if you're willing to write something to do
> the various combinations of steps and exercise them. It's not practical
> with pg_regress.
>
> More thorough testing benefits from an external harness.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Thanks both of you for the feedback.

I understood that I can do CREATE then ALTER EXTENSION UPDATE, but my goal
was to actually be able to run this workflow multiple ways without having
to hardcode that or duplicate steps.  To clarify, my intention was to only
test the binary for 1.1, but to then run the same test steps (without
having to duplicate things) under 2 separate scenarios:

   1. Create the extension right away at version 1.1, run the suite
   2. Create the extension at 1.0, populate extension config tables as if I
   have actually been using it, then upgrade to 1.1, and run the suite.

Here is how I ended up implementing this, and I am very open to feedback:

The first file is like this:

-- Allow running regression suite with upgrade paths
\set v `echo ${FROMVERSION:-1.1}`
SET client_min_messages = warning;
CREATE EXTENSION pglogical;
CREATE EXTENSION pgl_ddl_deploy VERSION :'v';

So if you run the suite without any environment variable, it will start at
1.1.  But if you add FROMVERSION=1.0, it will start at 1.0.

Then in test steps 2 and 3, I assume test results are identical between 1.0
and 1.1.  But in step 4, I run ALTER EXTENSION UPDATE, which is either a
no-op, or actually upgrades from 1.0.  The remaining tests in the whole
suite are designed to yield identical results in either path.

I am fairly happy with this because what I really wanted to test is
upgrading from 1.0 to 1.1 as opposed to a bare 1.1 install.

With yet later versions, I would of course need to modify this as
necessary, and I would want to test then yet more of the upgrade paths as
it is feasible.

Thanks,
Jeremy


Testing Extension Upgrade Paths

2017-12-11 Thread Jeremy Finzel
Hello -

I understand how to setup a regression suite for a postgres extension, but
what I'm not clear on from the docs is if there is a pattern that exists
for testing not only the latest version of an extension, but also an
upgraded previous version.

For example, I am developing version 1.1, and the test suite runs fine for
1.1.

But I want to install the extension at 1.0, do a few things, then upgrade
to 1.1 and run N tests over again from this path.

I have in mind something like an environment variable or something where
you could run the suite twice with the variation being a direct install at
the highest version, or an upgrade from a previous version.  Please excuse
my ignorance!

Is there any straightforward way to do this that doesn't involve manually
copying tests?

Thank you!
Jeremy

P.S. is this the right list for extension dev questions ???


How to use set/reset role in contrib_regression test?

2017-12-07 Thread Jeremy Finzel
Hello!  I hope this is the right list for extension dev questions?

I am finding odd behavior running make installcheck for a postgres
extension.  The user running the suite is a superuser ("jfinzel").  Once I
create any object as a test role, reset role does not work, although it
does work to do set role jfinzel.  See:

SELECT CURRENT_ROLE;
 current_user
--
 jfinzel
(1 row)

SET ROLE test_pgl_ddl_deploy;
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)

RESET ROLE;
SELECT CURRENT_ROLE;
 current_user
--
 jfinzel
(1 row)

SET ROLE test_pgl_ddl_deploy;
CREATE SCHEMA special;
CREATE TABLE special.foo (id serial primary key, foo text, bar text);
CREATE TABLE special.bar (id serial primary key, super text, man text);
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)

RESET ROLE;
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)

SET SESSION AUTHORIZATION DEFAULT;
RESET ROLE;
SELECT CURRENT_ROLE;
current_user
-
 test_pgl_ddl_deploy
(1 row)


What am I missing here?  Any comments much appreciated.

Thanks,
Jeremy Finzel