date:20200319

Re: error context for vacuum to include block number

2020-03-19 Thread Amit Kapila

On Fri, Mar 20, 2020 at 5:59 AM Justin Pryzby  wrote:
>
> On Thu, Mar 19, 2020 at 03:29:31PM -0500, Justin Pryzby wrote:
> > I was going to suggest that we could do that by passing in a pointer to a 
> > local
> > variable "LVRelStats olderrcbarg", like:
> > |update_vacuum_error_cbarg(vacrelstats, 
> > VACUUM_ERRCB_PHASE_SCAN_HEAP,
> > |  blkno, NULL, );
> >
> > and then later call:
> > |update_vacuum_error_cbarg(vacrelstats, olderrcbarg.phase,
> > |   olderrcbarg.blkno,
> > |   olderrcbarg.indname,
> > |   NULL);
> >
> > I implemented it in a separate patch, but it may be a bad idea, due to 
> > freeing
> > indname.  To exercise it, I tried to cause a crash by changing "else if
> > (errcbarg->indname)" to "if" without else, but wasn't able to cause a crash,
> > probably just due to having a narrow timing window.
>
> I realized it was better for the caller to just assign the struct on its own.
>

That makes sense.  I have a few more comments:

1.
+ VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+} errcb_phase;

Why do you need a comma after the last element in the above enum?

2.
+ /* Setup error traceback support for ereport() */
+ update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
+ InvalidBlockNumber, NULL);
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = 

Why do we need to call update_vacuum_error_cbarg at the above place
after we have added a new one inside for.. loop?

3.
+ * free_oldindex is true if the previous "indname" should be freed.  It must be
+ * false if the caller has copied the old LVRelSTats,

/LVRelSTats/LVRelStats

4.
/* Clear the error traceback phase */
  update_vacuum_error_cbarg(vacrelstats,
-   VACUUM_ERRCB_PHASE_UNKNOWN, InvalidBlockNumber,
-   NULL);
+   olderrcbarg.phase,
+   olderrcbarg.blkno,
+   olderrcbarg.indname,
+   true);

At this and similar places, change the comment to something like:
"Reset the old phase information for error traceback".

5.
Subject: [PATCH v28 3/5] Drop reltuples

---
 src/backend/access/heap/vacuumlazy.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

Is this patch directly related to the main patch (vacuum errcontext to
show block being processed) or is it an independent improvement of
code?

6.
[PATCH v28 4/5] add callback for truncation

+ VACUUM_ERRCB_PHASE_TRUNCATE,
+ VACUUM_ERRCB_PHASE_TRUNCATE_PREFETCH,

Do we really need separate phases for truncate and truncate_prefetch?
We have only one phase for a progress update, similarly, I think
having one phase for error reporting should be sufficient.  It will
also reduce the number of places where we need to call
update_vacuum_error_cbarg.  I think we can set
VACUUM_ERRCB_PHASE_TRUNCATE before count_nondeletable_pages and reset
it at the place you are doing right now in the patch.

7. Is there a reason to keep the truncate phase patch separate from
the main patch? If not, let's merge them.

8. Can we think of some easy way to add tests for this patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Memory-Bounded Hash Aggregation

2020-03-19 Thread Pengzhou Tang

On Fri, Mar 20, 2020 at 1:20 PM Pengzhou Tang  wrote:

> Hi,
>
> I happen to notice that "set enable_sort to false" cannot guarantee the
> planner to use hashagg in test groupingsets.sql,
> the following comparing results of sortagg and hashagg seems to have no
> meaning.
>
>
Please forget my comment, I should set enable_groupingsets_hash_disk too.

Re: Improving connection scalability: GetSnapshotData()

2020-03-19 Thread Thomas Munro

On Sun, Mar 1, 2020 at 9:36 PM Andres Freund  wrote:
> conns   tps master  tps pgxact-split
>
> 1   26842.49284526524.194821
> 10  246923.158682   249224.782661
> 50  695956.539704   709833.746374
> 100 1054727.043139  1903616.306028
> 200 964795.282957   1949200.338012
> 300 906029.377539   1927881.231478
> 400 845696.690912   1911065.369776
> 500 812295.222497   1926237.255856
> 600 888030.104213   1903047.236273
> 700 866896.532490   1886537.202142
> 800 863407.341506   1883768.592610
> 900 871386.608563   1874638.012128
> 1000887668.277133   1876402.391502
> 1500860051.361395   1815103.564241
> 2000890900.098657   1775435.271018
> 3000874184.980039   1653953.817997
> 4000845023.080703   1582582.316043
> 5000817100.195728   1512260.802371

This will clearly be really big news for lots of PostgreSQL users.

> One further cool recognition of the fact that GetSnapshotData()'s
> results can be made to only depend on the set of xids in progress, is
> that caching the results of GetSnapshotData() is almost trivial at that
> point: We only need to recompute snapshots when a toplevel transaction
> commits/aborts.
>
> So we can avoid rebuilding snapshots when no commt has happened since it
> was last built. Which amounts to assigning a current 'commit sequence
> number' to the snapshot, and checking that against the current number
> at the time of the next GetSnapshotData() call. Well, turns out there's
> this "LSN" thing we assign to commits (there are some small issues with
> that though).  I've experimented with that, and it considerably further
> improves the numbers above. Both with a higher peak throughput, but more
> importantly it almost entirely removes the throughput regression from
> 2000 connections onwards.
>
> I'm still working on cleaning that part of the patch up, I'll post it in
> a bit.

I looked at that part on your public pgxact-split branch.  In that
version you used "CSN" rather than something based on LSNs, which I
assume avoids complications relating to WAL locking or something like
that.  We should probably be careful to avoid confusion with the
pre-existing use of the term "commit sequence number" (CommitSeqNo,
CSN) that appears in predicate.c.  This also calls to mind the
2013-2016 work by Ants Aasma and others[1] on CSN-based snapshots,
which is obviously a much more radical change, but really means what
it says (commits).  The CSN in your patch set is used purely as a
level-change for snapshot cache invalidation IIUC, and it advances
also for aborts -- so maybe it should be called something like
completed_xact_count, using existing terminology from procarray.c.

+   if (snapshot->csn != 0 && MyProc->xidCopy == InvalidTransactionId &&
+   UINT64_ACCESS_ONCE(ShmemVariableCache->csn) == snapshot->csn)

Why is it OK to read ShmemVariableCache->csn without at least a read
barrier?  I suppose this allows a cached snapshot to be used very soon
after a transaction commits and should be visible to you, but ...
hmmmrkwjherkjhg... I guess it must be really hard to observe any
anomaly.  Let's see... maybe it's possible on a relaxed memory system
like POWER or ARM, if you use a shm flag to say "hey I just committed
a transaction", and the other guy sees the flag but can't yet see the
new CSN, so an SPI query can't see the transaction?

Another theoretical problem is the non-atomic read of a uint64 on some
32 bit platforms.

> 0007: WIP: Introduce abstraction layer for "is tuple invisible" tests.
>
>   This is the most crucial piece. Instead of code directly using
>   RecentOldestXmin, there's a new set of functions for testing
>   whether an xid is visible (InvisibleToEveryoneTestXid() et al).
>
>   Those function use new horizon boundaries computed as part of
>   GetSnapshotData(), and recompute an accurate boundary when the
>   tested xid falls inbetween.
>
>   There's a bit more infrastructure needed - we need to limit how
>   often an accurate snapshot is computed. Probably to once per
>   snapshot? Or once per transaction?
>
>
>   To avoid issues with the lower boundary getting too old and
>   presenting a wraparound danger, I made all the xids be
>   FullTransactionIds. That imo is a good thing?

+1, as long as we don't just move the wraparound danger to the places
where we convert xids to fxids!

+/*
+ * Be very careful about when to use this function. It can only safely be used
+ * when there is a guarantee that, at the time of the call, xid is within 2
+ * billion xids of rel. That e.g. can be guaranteed if the the caller assures
+ * a snapshot is held by the backend, and xid is from a table (where
+ * vacuum/freezing ensures the xid has to be within that range).
+ */
+static inline

Re: Memory-Bounded Hash Aggregation

2020-03-19 Thread Pengzhou Tang

Hi,

I happen to notice that "set enable_sort to false" cannot guarantee the
planner to use hashagg in test groupingsets.sql,
the following comparing results of sortagg and hashagg seems to have no
meaning.

Thanks,
Pengzhou

On Thu, Mar 19, 2020 at 7:36 AM Jeff Davis  wrote:

>
> Committed.
>
> There's some future work that would be nice (some of these are just
> ideas and may not be worth it):
>
> * Refactor MemoryContextMemAllocated() to be a part of
> MemoryContextStats(), but allow it to avoid walking through the blocks
> and freelists.
>
> * Improve the choice of the initial number of buckets in the hash
> table. For this patch, I tried to preserve the existing behavior of
> estimating the number of groups and trying to initialize with that many
> buckets. But my performance tests seem to indicate this is not the best
> approach. More work is needed to find what we should really do here.
>
> * For workloads that are not in work_mem *or* system memory, and need
> to actually go to storage, I see poor CPU utilization because it's not
> effectively overlapping CPU and IO work. Perhaps buffering or readahead
> changes can improve this, or async IO (even better).
>
> * Project unnecessary attributes away before spilling tuples to disk.
>
> * Improve logtape.c API so that the caller doesn't need to manage a
> bunch of tape numbers.
>
> * Improve estimate of the hash entry size. This patch doesn't change
> the way the planner estimates it, but I observe that actual size as
> seen at runtime is significantly different. This is connected to the
> initial number of buckets for the hash table.
>
> * In recursive steps, I don't have a good estimate for the number of
> groups, so I just estimate it as the number of tuples in that spill
> tape (which is pessimistic). That could be improved by doing a real
> cardinality estimate as the tuples are spilling (perhaps with HLL?).
>
> * Many aggregates with pass-by-ref transition states don't provide a
> great aggtransspace. We should consider doing something smarter, like
> having negative numbers represent a number that should be multiplied by
> the size of the group (e.g. ARRAY_AGG would have a size dependent on
> the group size, not a constant).
>
> * If we want to handle ARRAY_AGG (and the like) well, we can consider
> spilling the partial states in the hash table whem the memory is full.
> That would add a fair amount of complexity because there would be two
> types of spilled data (tuples and partial states), but it could be
> useful in some cases.
>
> Regards,
> Jeff Davis
>
>
>
>
>

Re: Missing errcode() in ereport

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 22:32:30 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I wonder if it'd become a relevant backpatch pain if we started to have
> > some ereports() without the additional parens in 13+.
> 
> Yeah, it would be a nasty backpatch hazard.
> 
> > Would it perhaps
> > make sense to backpatch just the part that removes the need for the
> > parents, but not the return type changes?
> 
> I was just looking into that.  It would be pretty painless to do it
> in v12, but before that we weren't requiring C99.  Having said that,
> trolling the buildfarm database shows exactly zero active members
> that don't report having __VA_ARGS__, in the branches where we test
> that.  (And pg_config.h.win32 was assuming that MSVC had that, too.)
> 
> Could we get away with moving the compiler goalposts for the back
> branches?  I dunno, but it's a fact that we aren't testing anymore
> with any compilers that would complain about unconditional use of
> __VA_ARGS__.  So it might be broken already and we wouldn't know it.

FWIW, I did grep for unprotected uses, and  didn't find anything.


> (I suspect the last buildfarm animal that would've complained about
> this was pademelon, which I retired more than a year ago IIRC.)

I guess a query that searches the logs backwards for animals without
__VA_ARGS__ would be a good idea?


> > We can also remove elog() support code now, because with __VA_ARGS__
> > support it's really just a wrapper for ereport(elevel,
> > errmsg(__VA_ARGS_)).  This is patch 0002.
> 
> Yeah, something similar had occurred to me but I didn't write the patch
> yet.  Note it should be errmsg_internal();

Oh, right.


> also, does the default for errcode come out the same?

I think so - there's no distinct code setting sqlerrcode in
elog_start/finish. That already relied on errstart().


> > I wonder if its worth additionally ensuring that errcode, errmsg,
> > ... are only called within errstart/errfinish?
> 
> Meh.  That's wrong at least for errcontext(), and I'm not sure it's
> really worth anything to enforce it for the others.

Yea, I'm not convinced either. Especially after changing the err* return
types to void, as that presumably will cause errors with a lot of
incorrect parens, e.g. about a function with void return type as an
argument to errmsg().


> I think the key decision we'd have to make to move forward on this
> is to decide whether it's still project style to prefer the extra
> parens, or whether we want new code to do without them going
> forward.  If we don't want to risk requiring __VA_ARGS__ for the
> old branches then I'd vote in favor of keeping the parens as
> preferred style, at least till v11 is out of support.

Agreed.


> If we do change that in the back branches then there'd be reason to
> prefer to go without parens.  New coders might still be confused about
> why there are all these calls with the useless parens, though.

That seems to be an acceptable pain, from my POV.

Greetings,

Andres Freund

Re: Missing errcode() in ereport

2020-03-19 Thread Tom Lane

Andres Freund  writes:
> I wonder if it'd become a relevant backpatch pain if we started to have
> some ereports() without the additional parens in 13+.

Yeah, it would be a nasty backpatch hazard.

> Would it perhaps
> make sense to backpatch just the part that removes the need for the
> parents, but not the return type changes?

I was just looking into that.  It would be pretty painless to do it
in v12, but before that we weren't requiring C99.  Having said that,
trolling the buildfarm database shows exactly zero active members
that don't report having __VA_ARGS__, in the branches where we test
that.  (And pg_config.h.win32 was assuming that MSVC had that, too.)

Could we get away with moving the compiler goalposts for the back
branches?  I dunno, but it's a fact that we aren't testing anymore
with any compilers that would complain about unconditional use of
__VA_ARGS__.  So it might be broken already and we wouldn't know it.
(I suspect the last buildfarm animal that would've complained about
this was pademelon, which I retired more than a year ago IIRC.)

> We can also remove elog() support code now, because with __VA_ARGS__
> support it's really just a wrapper for ereport(elevel,
> errmsg(__VA_ARGS_)).  This is patch 0002.

Yeah, something similar had occurred to me but I didn't write the patch
yet.  Note it should be errmsg_internal(); also, does the default
for errcode come out the same?

> I think it might also be a good idea to move __FILE__, __LINE__,
> PG_FUNCNAME_MACRO, domain from being parameters to errstart to
> errfinish. For elevel < ERROR its sad that we currently pass them even
> if we don't emit the message.  This is patch 0003.

Oh, that's a good idea that hadn't occurred to me.

> I wonder if its worth additionally ensuring that errcode, errmsg,
> ... are only called within errstart/errfinish?

Meh.  That's wrong at least for errcontext(), and I'm not sure it's
really worth anything to enforce it for the others.

I think the key decision we'd have to make to move forward on this
is to decide whether it's still project style to prefer the extra
parens, or whether we want new code to do without them going
forward.  If we don't want to risk requiring __VA_ARGS__ for the
old branches then I'd vote in favor of keeping the parens as
preferred style, at least till v11 is out of support.  If we do
change that in the back branches then there'd be reason to prefer
to go without parens.  New coders might still be confused about
why there are all these calls with the useless parens, though.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-20 15:05:03 +1300, David Rowley wrote:
> On Fri, 20 Mar 2020 at 11:17, Andres Freund  wrote:
> > I think there's too much "reinventing" autovacuum scheduling in a
> > "local" insert-only manner happening in this thread. And as far as I can
> > tell additionally only looking at a somewhat narrow slice of insert only
> > workloads.
> 
> I understand your concern and you might be right. However, I think the
> main reason that the default settings for the new threshold and scale
> factor has deviated this far from the existing settings is regarding
> the example of a large insert-only table that receives inserts of 1
> row per xact.  If we were to copy the existing settings then when that
> table gets to 1 billion rows, it would be eligible for an
> insert-vacuum after 200 million tuples/xacts, which does not help the
> situation since an anti-wraparound vacuum would be triggering then
> anyway.

Sure, that'd happen for inserts that happen after that threshold. I'm
just not convinced that this is as huge a problem as presented in this
thread. And I'm fairly convinced the proposed solution is the wrong
direction to go into.

It's not like that's not an issue for updates? If you update one row per
transaction, then you run into exactly the same issue for a table of the
same size?  You maybe could argue that it's more common to insert 1
billion tuples in individual transaction, than it is to update 1 billion
tuples in individual transactions, but I don't think it's a huge
difference if it even exist.

In fact the problem is worse for the update case, because that tends to
generate a lot more random looking IO during vacuum (both because only
parts of the table are updated causing small block reads/writes, and
because it will need [multiple] index scans/vacuum, and because the
vacuum is a lot more expensive CPU time wise).

Imo this line of reasoning is about adding autovacuum scheduling based
on xid age, not about insert only workloads.

Greetings,

Andres Freund

Re: color by default

2020-03-19 Thread Bruce Momjian

On Tue, Mar  3, 2020 at 02:31:01PM +0900, Michael Paquier wrote:
> On Mon, Mar 02, 2020 at 01:00:44PM +0100, Juan José Santamaría Flecha wrote:
> > - The new entry in the documentation, specially as the PG_COLORS parameter
> > seems to be currently undocumented. The programs that can use PG_COLOR
> > would benefit from getting a link to it.
> 
> The actual problem here is that we don't have an actual centralized
> place where we could put that stuff.  And anything able to use this
> option is basically anything using src/common/logging.c.
> 
> Regarding PG_COLORS, the commit message of cc8d415 mentions it, but we
> have no actual example of how to use it, and the original thread has
> zero reference to it:
> https://www.postgresql.org/message-id/6a609b43-4f57-7348-6480-bd022f924...@2ndquadrant.com
> 
> And in fact, it took me a while to figure out that using it is a mix
> of three keywords ("error", "warning" or "locus") separated by colons
> which need to have an equal sign to the color defined.  Here is for
> example how to make the locus show up in yellow with errors in blue:
> export PG_COLORS='error=01;34:locus=01;33'
> 
> Having to dig into the code to find out that stuff is not a good user
> experience.  And I found out about that only because I worked on a
> patch touching this area yesterday.

I can confirm there is still no mention of PG_COLORS in our
documentation.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Missing errcode() in ereport

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 21:03:17 -0400, Tom Lane wrote:
> I wrote:
> > I think that at least some compilers will complain about side-effect-free
> > subexpressions of a comma expression.  Could we restructure things so
> > that the errcode/errmsg/etc calls form a standalone comma expression
> > rather than appearing to be arguments of a varargs function?
>
> Yeah, the attached quick-hack patch seems to work really well for
> this.

Heh, you're too fast. I just sent something similar, and a few followup
patches.

What is your thinking about pain around backpatching it might introduce?
We can't easily backpatch support for ereport-without-extra-parens, I
think, because it needs __VA_ARGS__?

> As a nice side benefit, the backend gets a couple KB smaller from
> removal of errfinish's useless dummy argument.

I don't think it's the removal of the dummy parameter itself that
constitutes most of the savings, but instead it's not having to push the
return value of errmsg(), errcode(), et al as vararg arguments.

> I think that we could now also change all the helper functions
> (errmsg et al) to return void instead of a dummy value, but that
> would make the patch a lot longer and probably not move the
> goalposts much in terms of error detection.

I did include that in my prototype patch. Agree that it's not necessary
for the error detection capability, but it seems misleading to leave the
return values around if they're not actually needed.

> It also looks like we could use the same trick to get plain elog()
> to have the behavior of not evaluating its arguments when it's
> not going to print anything.  I've not poked at that yet either.

I've included a patch for that. I think it's now sufficient to
#define elog(elevel, ...)  \
ereport(elevel, errmsg(__VA_ARGS__))

which imo is quite nice, as it allows us to get rid of a lot of
duplication.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread David Rowley

On Fri, 20 Mar 2020 at 11:17, Andres Freund  wrote:
> I think there's too much "reinventing" autovacuum scheduling in a
> "local" insert-only manner happening in this thread. And as far as I can
> tell additionally only looking at a somewhat narrow slice of insert only
> workloads.

I understand your concern and you might be right. However, I think the
main reason that the default settings for the new threshold and scale
factor has deviated this far from the existing settings is regarding
the example of a large insert-only table that receives inserts of 1
row per xact.  If we were to copy the existing settings then when that
table gets to 1 billion rows, it would be eligible for an
insert-vacuum after 200 million tuples/xacts, which does not help the
situation since an anti-wraparound vacuum would be triggering then
anyway.

I'm unsure if it will help with the discussion, but I put together a
quick and dirty C program to show when a table will be eligible for an
auto-vacuum with the given scale_factor and threshold

$ gcc -O2 vacuum.c -o vacuum
$ ./vacuum
Syntax ./vacuum   
$ ./vacuum 0.01 1000 1000 | tail -n 1
Vacuum 463 at 99183465731 reltuples, 991915456 inserts
$ ./vacuum 0.2 50 1000 | tail -n 1
Vacuum 108 at 90395206733 reltuples, 15065868288 inserts

So, yeah, certainly, there are more than four times as many vacuums
with an insert-only table of 100 billion rows using the proposed
settings vs the defaults for the existing scale_factor and threshold.
However, at the tail end of the first run there, we were close to a
billion rows (991,915,456) between vacucums. Is that too excessive?

I'm sharing this in the hope that it'll make it easy to experiment
with settings which we can all agree on.

For a 1 billion row table, the proposed settings give us 69 vacuums
and the standard settings 83.
#include 
#include 

unsigned long long
atoull(char *a)
{
unsigned long long val = 0;
/* crude, no wraparound checks */
while (*a >= '0' && *a <= '9')
{
val = val * 10 + *a - '0';
a++;
}
return val;
}

int
main(int argc, char **argv)
{
unsigned long long reltuples;
float scale_factor;
unsigned long long threshold;
unsigned long long n_vacuums;
unsigned long long n_inserts;
unsigned long long max_tuples;

if (argc < 4)
{
fprintf(stderr, "Syntax %s   \n", argv[0]);
return -1;
}

scale_factor = atof(argv[1]);
threshold = atoull(argv[2]);
max_tuples = atoull(argv[3]);
reltuples = 1;
n_vacuums = 0;

printf("scale_factor = %g, threshold = %llu\n", scale_factor, threshold);

for(;;)
{
n_inserts = threshold + reltuples * scale_factor + 1;

/* do "vacuum" */
n_vacuums++;
reltuples += n_inserts;

if (reltuples > max_tuples)
break;
printf("Vacuum %llu at %llu reltuples, %llu inserts\n", n_vacuums, 
reltuples, n_inserts);
}

return 0;
}

Re: Missing errcode() in ereport

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 19:32:55 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-03-19 14:07:04 -0400, Tom Lane wrote:
> >> Now that we can rely on having varargs macros, I think we could
> >> stop requiring the extra level of parentheses,
>
> > I think that'd be an improvement, because:
> > ane of the ones I saw confuse people is just:
> > /home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: error: 
> > ‘ereport’ undeclared (first use in this function); did you mean ‘rresvport’?
> >  3727 |ereport(FATAL,
> >   |^~~
> >   |rresvport
> > /home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: note: each 
> > undeclared identifier is reported only once for each function it appears in
> > because the extra parens haven't been added.
>
> Ah, so the macro isn't expanded at all if it appears to have more than
> two parameters?  That seems odd, but I suppose some compilers might
> work that way.  Switching to varargs would improve that for sure.

Yea. Newer gcc versions do warn about a parameter mismatch count, which
helps.  I'm not sure what the preprocessor / compiler should do intead?

FWIW, clang also doesn't expand the macro.


> > Another one I've both seen and committed byself is converting an elog to
> > an ereport, and not adding an errcode() around the error code - which
> > silently looks like it works.
>
> You mean not adding errmsg() around the error string?  Yeah, I can
> believe that one easily.  It's sort of the same problem as forgetting
> to wrap errcode() around the ERRCODE_ constant.

Both of these, I think.


> Could we restructure things so that the errcode/errmsg/etc calls form
> a standalone comma expression rather than appearing to be arguments of
> a varargs function?  The compiler's got no hope of realizing there's
> anything wrong when it thinks it's passing an integer or string
> constant to a function it knows nothing about.  But if it could see
> that nothing at all is being done with the constant, maybe we'd get
> somewhere.

Worth a try. Not doing a pointless varargs call could also end up
reducing elog overhead a bit (right now we push a lot of 0 as vararg
arguments for no reason).

A quick hack suggests that it works:

/home/andres/src/postgresql/src/backend/tcop/postgres.c: In function 
‘process_postgres_switches’:
/home/andres/src/postgresql/src/backend/tcop/postgres.c:3728:27: warning: 
left-hand operand of comma expression has no effect [-Wunused-value]
 3728 |  (ERRCODE_SYNTAX_ERROR,
  |   ^
/home/andres/src/postgresql/src/include/utils/elog.h:126:4: note: in definition 
of macro ‘ereport_domain’
  126 |rest; \
  |^~~~
/home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: note: in 
expansion of macro ‘ereport’
 3727 |ereport(FATAL,
  |^~~


and in an optimized build the resulting code indeed looks a bit better:

before:
   textdata bss dec hex filename
8462795  176128  204464 8843387  86f07b src/backend/postgres

Looking at FloatExceptionHandler as an example:

2860/* We're not returning, so no need to save errno */
2861ereport(ERROR,
   0x00458bc0 <+0>: push   %rbp
   0x00458bc1 <+1>: mov$0xb2d,%edx
   0x00458bc6 <+6>: lea0x214c9b(%rip),%rsi# 0x66d868
   0x00458bcd <+13>:mov%rsp,%rbp
   0x00458bd0 <+16>:push   %r13
   0x00458bd2 <+18>:xor%r8d,%r8d
   0x00458bd5 <+21>:lea0x2d9dc4(%rip),%rcx# 0x7329a0 
<__func__.41598>
   0x00458bdc <+28>:push   %r12
   0x00458bde <+30>:mov$0x14,%edi
   0x00458be3 <+35>:callq  0x5a8c00 
   0x00458be8 <+40>:lea0x214e59(%rip),%rdi# 0x66da48
   0x00458bef <+47>:xor%eax,%eax
   0x00458bf1 <+49>:callq  0x5acb00 
   0x00458bf6 <+54>:mov%eax,%r13d
   0x00458bf9 <+57>:lea0x1bf7fb(%rip),%rdi# 0x6183fb
   0x00458c00 <+64>:xor%eax,%eax
   0x00458c02 <+66>:callq  0x5ac710 
   0x00458c07 <+71>:mov$0x1020082,%edi
   0x00458c0c <+76>:mov%eax,%r12d
   0x00458c0f <+79>:callq  0x5ac5b0 
   0x00458c14 <+84>:mov%eax,%edi
   0x00458c16 <+86>:mov%r13d,%edx
   0x00458c19 <+89>:mov%r12d,%esi
   0x00458c1c <+92>:xor%eax,%eax
   0x00458c1e <+94>:callq  0x5ac020 

vs after:
   textdata bss dec hex filename
8395731  176128  204464 8776323  85ea83 src/backend/postgres
2861ereport(ERROR,
   0x00449dd0 <+0>: push   %rbp
   0x00449dd1 <+1>: xor%r8d,%r8d
   0x00449dd4 <+4>: lea0x2d8a65(%rip),%rcx# 0x722840 
<__func__.41598>
   0x00449ddb <+11>:mov%rsp,%rbp
   0x00449dde <+14>:mov$0xb2d,%edx
   0x00449de3

Re: Add A Glossary

2020-03-19 Thread Corey Huinker

On Thu, Mar 19, 2020 at 8:11 PM Alvaro Herrera 
wrote:

> I gave this a look.  I first reformatted it so I could read it; that's
> 0001.  Second I changed all the long  items into s, which
>

Thanks! I didn't know about xrefs, that is a big improvement.


> are shorter and don't have to repeat the title of the refered to page.
> (Of course, this changes the link to be in the same style as every other
> link in our documentation; some people don't like it. But it's our
> style.)
>
> There are some mistakes.  "Tupple" is most glaring one -- not just the
> typo but also the fact that it goes to sql-revoke.  A few definitions
> we'll want to modify.  Nothing too big.  In general I like this work and
> I think we should have it in pg13.
>
> Please bikeshed the definition of your favorite term, and suggest what
> other terms to add.  No pointing out of mere typos yet, please.
>

Jürgen mentioned off-list that the man page doesn't build. I was going to
look into that, but if anyone has more familiarity with that, I'm listening.


> I think we should have the terms Consistency, Isolation, Durability.
>

+1

Re: improve transparency of bitmap-only heap scans

2020-03-19 Thread James Coleman

On Thu, Mar 19, 2020 at 9:26 PM Justin Pryzby  wrote:
>
> On Mon, Mar 16, 2020 at 09:08:36AM -0400, James Coleman wrote:
> > Does the original optimization cover parallel bitmap heap scans like this?
>
> It works for parallel bitmap only scans.
>
> template1=# explain analyze select count(*) from exp where a between 25 and 
> 35 and d between 5 and 10;
>  Finalize Aggregate  (cost=78391.68..78391.69 rows=1 width=8) (actual 
> time=525.972..525.972 rows=1 loops=1)
>->  Gather  (cost=78391.47..78391.68 rows=2 width=8) (actual 
> time=525.416..533.133 rows=3 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  ->  Partial Aggregate  (cost=77391.47..77391.48 rows=1 width=8) 
> (actual time=518.406..518.406 rows=1 loops=3)
>->  Parallel Bitmap Heap Scan on exp  (cost=31825.37..77245.01 
> rows=58582 width=0) (actual time=296.309..508.440 rows=43887 loops=3)
>  Recheck Cond: ((a >= 25) AND (a <= 35) AND (d >= 5) AND 
> (d <= 10))
>  Heap Blocks: unfetched=4701 exact=9650
>  ->  BitmapAnd  (cost=31825.37..31825.37 rows=140597 
> width=0) (actual time=282.590..282.590 rows=0 loops=1)
>->  Bitmap Index Scan on index_exp_a  
> (cost=0.00..15616.99 rows=1166456 width=0) (actual time=147.036..147.036 
> rows=1099872 loops=1)
>  Index Cond: ((a >= 25) AND (a <= 35))
>->  Bitmap Index Scan on index_exp_d  
> (cost=0.00..16137.82 rows=1205339 width=0) (actual time=130.366..130.366 
> rows=120 loops=1)
>  Index Cond: ((d >= 5) AND (d <= 10))
>
>
> > +++ b/src/backend/commands/explain.c
> > @@ -2777,6 +2777,8 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, 
> > ExplainState *es)
> >  {
> >   if (es->format != EXPLAIN_FORMAT_TEXT)
> >   {
> > + ExplainPropertyInteger("Unfetched Heap Blocks", NULL,
> > +
> > planstate->unfetched_pages, es);
> >   ExplainPropertyInteger("Exact Heap Blocks", NULL,
> >  
> > planstate->exact_pages, es);
> >   ExplainPropertyInteger("Lossy Heap Blocks", NULL,
> > @@ -2784,10 +2786,14 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, 
> > ExplainState *es)
> >   }
> >   else
> >   {
> > - if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
> > + if (planstate->exact_pages > 0 || planstate->lossy_pages > 0
> > + || planstate->unfetched_pages > 0)
> >   {
> >   ExplainIndentText(es);
> >   appendStringInfoString(es->str, "Heap Blocks:");
> > + if (planstate->unfetched_pages > 0)
> > + appendStringInfo(es->str, " unfetched=%ld",
> > +  
> > planstate->unfetched_pages);
> >   if (planstate->exact_pages > 0)
> >   appendStringInfo(es->str, " exact=%ld", 
> > planstate->exact_pages);
> >   if (planstate->lossy_pages > 0)

Awesome, thanks for confirming with an actual plan.

> I don't think it matters in nontext mode, but at least in text mode, I think
> maybe the Unfetched blocks should be output after the exact and lossy blocks,
> in case someone is parsing it, and because bitmap-only is a relatively new
> feature.  Its output is probably less common than exact/lossy.

I tweaked that (and a comment that didn't reference the change); see attached.

James
From 3d52fc04cab36819a93638c104ae1a81208c54df Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Sun, 15 Mar 2020 20:27:19 -0400
Subject: [PATCH v4] Show bitmap only unfetched page count to EXPLAIN

7c70996ebf0949b142a99 added an optimization to bitmap heap scans to
avoid fetching the heap page where the visibility map makes that
possible, much like index only scans.

However that commit didn't add this output to EXPLAIN, so it's not
obvious when the optimization is kicking in and when it's not. So, track
the number of skipped pages and report it in EXPLAIN output.

Author: Jeff Janes
Reviewers: Emre Hasegeli, Alexey Bashtanov, James Coleman
---
 src/backend/commands/explain.c| 10 --
 src/backend/executor/nodeBitmapHeapscan.c |  5 +++--
 src/include/nodes/execnodes.h |  2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4a50..52c35e640d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2770,7 +2770,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
+ * If it's EXPLAIN ANALYZE, show exact/lossy/unfetched pages for a

Re: improve transparency of bitmap-only heap scans

2020-03-19 Thread Justin Pryzby

On Mon, Mar 16, 2020 at 09:08:36AM -0400, James Coleman wrote:
> Does the original optimization cover parallel bitmap heap scans like this?

It works for parallel bitmap only scans.

template1=# explain analyze select count(*) from exp where a between 25 and 35 
and d between 5 and 10;
 Finalize Aggregate  (cost=78391.68..78391.69 rows=1 width=8) (actual 
time=525.972..525.972 rows=1 loops=1)
   ->  Gather  (cost=78391.47..78391.68 rows=2 width=8) (actual 
time=525.416..533.133 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=77391.47..77391.48 rows=1 width=8) 
(actual time=518.406..518.406 rows=1 loops=3)
   ->  Parallel Bitmap Heap Scan on exp  (cost=31825.37..77245.01 
rows=58582 width=0) (actual time=296.309..508.440 rows=43887 loops=3)
 Recheck Cond: ((a >= 25) AND (a <= 35) AND (d >= 5) AND (d 
<= 10))
 Heap Blocks: unfetched=4701 exact=9650
 ->  BitmapAnd  (cost=31825.37..31825.37 rows=140597 
width=0) (actual time=282.590..282.590 rows=0 loops=1)
   ->  Bitmap Index Scan on index_exp_a  
(cost=0.00..15616.99 rows=1166456 width=0) (actual time=147.036..147.036 
rows=1099872 loops=1)
 Index Cond: ((a >= 25) AND (a <= 35))
   ->  Bitmap Index Scan on index_exp_d  
(cost=0.00..16137.82 rows=1205339 width=0) (actual time=130.366..130.366 
rows=120 loops=1)
 Index Cond: ((d >= 5) AND (d <= 10))


> +++ b/src/backend/commands/explain.c
> @@ -2777,6 +2777,8 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, 
> ExplainState *es)
>  {
>   if (es->format != EXPLAIN_FORMAT_TEXT)
>   {
> + ExplainPropertyInteger("Unfetched Heap Blocks", NULL,
> +
> planstate->unfetched_pages, es);
>   ExplainPropertyInteger("Exact Heap Blocks", NULL,
>  
> planstate->exact_pages, es);
>   ExplainPropertyInteger("Lossy Heap Blocks", NULL,
> @@ -2784,10 +2786,14 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, 
> ExplainState *es)
>   }
>   else
>   {
> - if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
> + if (planstate->exact_pages > 0 || planstate->lossy_pages > 0
> + || planstate->unfetched_pages > 0)
>   {
>   ExplainIndentText(es);
>   appendStringInfoString(es->str, "Heap Blocks:");
> + if (planstate->unfetched_pages > 0)
> + appendStringInfo(es->str, " unfetched=%ld",
> +  
> planstate->unfetched_pages);
>   if (planstate->exact_pages > 0)
>   appendStringInfo(es->str, " exact=%ld", 
> planstate->exact_pages);
>   if (planstate->lossy_pages > 0)


I don't think it matters in nontext mode, but at least in text mode, I think
maybe the Unfetched blocks should be output after the exact and lossy blocks,
in case someone is parsing it, and because bitmap-only is a relatively new
feature.  Its output is probably less common than exact/lossy.

-- 
Justin

Re: Missing errcode() in ereport

2020-03-19 Thread Tom Lane

I wrote:
> I think that at least some compilers will complain about side-effect-free
> subexpressions of a comma expression.  Could we restructure things so
> that the errcode/errmsg/etc calls form a standalone comma expression
> rather than appearing to be arguments of a varargs function?

Yeah, the attached quick-hack patch seems to work really well for this.
I find that this now works:

ereport(ERROR,
errcode(ERRCODE_DIVISION_BY_ZERO),
errmsg("foo"));

where before it gave the weird error you quoted.  Also, these both
draw warnings about side-effect-free expressions:

ereport(ERROR,
ERRCODE_DIVISION_BY_ZERO,
errmsg("foo"));

ereport(ERROR,
"%d", 42);

With gcc you just need -Wall to get such warnings, and clang
seems to give them by default.

As a nice side benefit, the backend gets a couple KB smaller from
removal of errfinish's useless dummy argument.

I think that we could now also change all the helper functions
(errmsg et al) to return void instead of a dummy value, but that
would make the patch a lot longer and probably not move the
goalposts much in terms of error detection.

It also looks like we could use the same trick to get plain elog()
to have the behavior of not evaluating its arguments when it's
not going to print anything.  I've not poked at that yet either.

regards, tom lane

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 62eef7b..a29ccd9 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -438,7 +438,7 @@ matches_backtrace_functions(const char *funcname)
  * See elog.h for the error level definitions.
  */
 void
-errfinish(int dummy,...)
+errfinish(void)
 {
 	ErrorData  *edata = [errordata_stack_depth];
 	int			elevel;
@@ -1463,7 +1463,7 @@ elog_finish(int elevel, const char *fmt,...)
 	/*
 	 * And let errfinish() finish up.
 	 */
-	errfinish(0);
+	errfinish();
 }
 
 
@@ -1749,7 +1749,7 @@ ThrowErrorData(ErrorData *edata)
 	recursion_depth--;
 
 	/* Process the error. */
-	errfinish(0);
+	errfinish();
 }
 
 /*
@@ -1863,7 +1863,7 @@ pg_re_throw(void)
 		 */
 		error_context_stack = NULL;
 
-		errfinish(0);
+		errfinish();
 	}
 
 	/* Doesn't return ... */
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 0a4ef02..e6fa85f 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -118,34 +118,34 @@
  *--
  */
 #ifdef HAVE__BUILTIN_CONSTANT_P
-#define ereport_domain(elevel, domain, rest)	\
+#define ereport_domain(elevel, domain, ...)	\
 	do { \
 		pg_prevent_errno_in_scope(); \
 		if (errstart(elevel, __FILE__, __LINE__, PG_FUNCNAME_MACRO, domain)) \
-			errfinish rest; \
+			__VA_ARGS__, errfinish(); \
 		if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
 #else			/* !HAVE__BUILTIN_CONSTANT_P */
-#define ereport_domain(elevel, domain, rest)	\
+#define ereport_domain(elevel, domain, ...)	\
 	do { \
 		const int elevel_ = (elevel); \
 		pg_prevent_errno_in_scope(); \
 		if (errstart(elevel_, __FILE__, __LINE__, PG_FUNCNAME_MACRO, domain)) \
-			errfinish rest; \
+			__VA_ARGS__, errfinish(); \
 		if (elevel_ >= ERROR) \
 			pg_unreachable(); \
 	} while(0)
 #endif			/* HAVE__BUILTIN_CONSTANT_P */
 
-#define ereport(elevel, rest)	\
-	ereport_domain(elevel, TEXTDOMAIN, rest)
+#define ereport(elevel, ...)	\
+	ereport_domain(elevel, TEXTDOMAIN, __VA_ARGS__)
 
 #define TEXTDOMAIN NULL
 
 extern bool errstart(int elevel, const char *filename, int lineno,
 	 const char *funcname, const char *domain);
-extern void errfinish(int dummy,...);
+extern void errfinish(void);
 
 extern int	errcode(int sqlerrcode);

Re: nbtree: assertion failure in _bt_killitems() for posting tuple

2020-03-19 Thread Peter Geoghegan

On Thu, Mar 19, 2020 at 9:34 AM Anastasia Lubennikova
 wrote:
> During tests, we catched an assertion failure in _bt_killitems() for
> posting tuple in unique index:
>
> /* kitem must have matching offnum when heap TIDs match */
> Assert(kitem->indexOffset == offnum);
>
> https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtutils.c#L1809
>
> I struggle to understand the meaning of this assertion.
> Don't we allow the chance that posting tuple moved right on the page as
> the comment says?

I think you're right. However, it still seems like we should check
that "kitem->indexOffset" is consistent among all of the BTScanPosItem
entries that we have for each TID that we believe to be from the same
posting list tuple.

(Thinks some more...)

Even if the offnum changes when the buffer lock is released, due to
somebody inserting on to the same page, I guess that we still expect
to observe all of the heap TIDs together in the posting list. Though
maybe not. Maybe it's possible for a deduplication pass to occur when
the buffer lock is dropped, in which case we should arguably behave in
the same way when we see the same heap TIDs (i.e. delete the entire
posting list without regard for whether or not the TIDs happened to
appear in a posting list initially). I'm not sure, though.

It will make no difference most of the time, since the
kill_prior_tuple stuff is generally not applied when the page is
changed at all -- the LSN is checked by the logic added by commit
2ed5b87f. That's why I asked about unlogged indexes (we don't do the
LSN thing there). But I still think that we need to take a firm
position on it.

-- 
Peter Geoghegan

Re: error context for vacuum to include block number

2020-03-19 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 03:29:31PM -0500, Justin Pryzby wrote:
> I was going to suggest that we could do that by passing in a pointer to a 
> local
> variable "LVRelStats olderrcbarg", like:
> |update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
> |  blkno, NULL, );
> 
> and then later call:
> |update_vacuum_error_cbarg(vacrelstats, olderrcbarg.phase,
> |   olderrcbarg.blkno,
> |   olderrcbarg.indname,
> |   NULL);
> 
> I implemented it in a separate patch, but it may be a bad idea, due to freeing
> indname.  To exercise it, I tried to cause a crash by changing "else if
> (errcbarg->indname)" to "if" without else, but wasn't able to cause a crash,
> probably just due to having a narrow timing window.

I realized it was better for the caller to just assign the struct on its own.

Which gives me an excuse for resending patch, which is needed since I spent too
much time testing this that I failed to update the tip commit for the new
argument.

> It's probably a good idea to pass the indname rather than the relation in any
> case.

I included that with 0001.
I also fixed the argument name in the prototype (Relation rel vs indrel).

And removed these, which were the whole motivation behind saving the values.
|Set the error context while continuing heap scan

-- 
Justin
>From a77a85638404f061d43f06bf3a566f5aa1c5d5f4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v28 1/5] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 209 +++
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 185 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..768a69120b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	errcb_phase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 			   IndexBulkDeleteResult **stats,
-			   double reltuples, bool estimated_count);
+			   double reltuples, bool estimated_count, LVRelStats *vacrelstats);
 static int	lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 			 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(VacuumParams *params,
@@ -337,13 +352,13 @@ static void lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult *
 		 int nindexes);
 static void parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
   LVShared *lvshared, LVDeadTuples *dead_tuples,
-  int nindexes);
+  int nindexes, LVRelStats *vacrelstats);
 static void vacuum_indexes_leader(Relation *Irel, IndexBulkDeleteResult **stats,
   LVRelStats *vacrelstats, LVParallelState *lps,
   int nindexes);
 static void vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
 			 LVShared *lvshared, LVSharedIndStats *shared_indstats,
-			 LVDeadTuples *dead_tuples);
+			 LVDeadTuples *dead_tuples, LVRelStats *vacrelstats);
 static void lazy_cleanup_all_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
 	 LVRelStats *vacrelstats, LVParallelState *lps,
 	 int nindexes);
@@ -361,6 +376,9 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);

Re: Add A Glossary

2020-03-19 Thread Alvaro Herrera

I gave this a look.  I first reformatted it so I could read it; that's
0001.  Second I changed all the long  items into s, which
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)

There are some mistakes.  "Tupple" is most glaring one -- not just the
typo but also the fact that it goes to sql-revoke.  A few definitions
we'll want to modify.  Nothing too big.  In general I like this work and
I think we should have it in pg13.

Please bikeshed the definition of your favorite term, and suggest what
other terms to add.  No pointing out of mere typos yet, please.

I think we should have the terms Consistency, Isolation, Durability.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 72f7a425fcd9a803010294e6974ecd7680a9aee6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 19 Mar 2020 18:29:12 -0300
Subject: [PATCH 1/2] glossary

---
 doc/src/sgml/filelist.sgml |1 +
 doc/src/sgml/glossary.sgml | 1441 
 doc/src/sgml/postgres.sgml |1 +
 3 files changed, 1443 insertions(+)
 create mode 100644 doc/src/sgml/glossary.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3da2365ea9..504c8a6326 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -170,6 +170,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index 00..cf20fb759c
--- /dev/null
+++ b/doc/src/sgml/glossary.sgml
@@ -0,0 +1,1441 @@
+
+ Glossary
+ 
+  This is a list of terms and their meaning in the context of
+  PostgreSQL and relational database
+  systems in general.
+ 
+  
+   
+Aggregate
+
+ 
+  To combine a collection of data values into a single value, whose
+  value may not be of the same type as the original values.
+  Aggregate Functions
+  combine multiple Rows that share a common set
+  of values into one Row, which means that the
+  only data visible in the values in common, and the aggregates of the
+  non-common data.
+ 
+ 
+  For more information, see
+  Aggregate Functions.
+ 
+
+   
+
+   
+Analytic
+
+ 
+  A Function whose computed value can reference
+  values found in nearby Rows of the same
+  Result Set.
+ 
+ 
+  For more information, see
+  Window Functions.
+ 
+
+   
+
+   
+Archiver
+
+ 
+  A process that backs up WAL Files in order to
+  reclaim space on the file system.
+ 
+ 
+  For more information, see
+  Backup and Restore: Continuous Archiving and Point-in-Time Recovery (PITR).
+ 
+
+   
+
+   
+Atomic
+
+ 
+  In reference to the value of an Attribute or
+  Datum: cannot be broken down into smaller
+  components.
+ 
+ 
+  In reference to an operation: An event that cannot be completed in
+  part: it must either entirely succeed or entirely fail. A series of
+  SQL statements can be combined into a
+  Transaction, and that
+  transaction is said to be
+  Atomic.
+ 
+
+   
+
+   
+Attribute
+
+ 
+  An element with a certain name and data type found within a
+  Tuple or Table.
+ 
+
+   
+
+   
+Autovacuum
+
+ 
+  Processes that remove outdated MVCC
+  Records of the Heap and
+  Index.
+ 
+ 
+  For more information, see
+  Routine Database Maintenance Tasks: Routine Vacuuming.
+ 
+
+   
+
+   
+Backend Process
+
+ 
+  Processes of an Instance which act on behalf of
+  client Connections and handle their requests.
+ 
+ 
+  (Don't confuse this term with the similar terms Background
+  Worker or Background Writer).
+ 
+
+   
+
+   
+Backend Server
+
+ 
+  See Instance.
+ 
+
+   
+
+   
+Background Worker
+
+ 
+  Individual processes within an Instance, which
+  run system- or user-supplied code. Typical use cases are processes
+  which handle parts of an SQL query to take
+  advantage of parallel execution on servers with multiple
+  CPUs.
+
+
+ For more information, see
+ Background Worker Processes.
+
+
+   
+
+   
+Background Writer
+
+ 
+  Writes continuously dirty pages from Shared
+  Memory to the file system. It starts periodically, but
+  works only for a short period in order to distribute expensive
+  I/O activity over time instead of generating fewer
+  large I/O peaks which could block other processes.
+ 
+ 
+  For more information, see
+  Server Configuration: Resource Consumption.
+ 
+
+   
+
+   
+Cast
+
+ 
+  A

Re: improve transparency of bitmap-only heap scans

2020-03-19 Thread James Coleman

On Mon, Mar 16, 2020 at 9:08 AM James Coleman  wrote:
> ...
> One question though: if I change the query to:
> explain (analyze, buffers) select count(*) from exp where a between 50
> and 100 and d between 5 and 10;
> then I get a parallel bitmap heap scan, and I only see exact heap
> blocks (see attached explain output).
>
> Does the original optimization cover parallel bitmap heap scans like
> this? If not, I think this patch is likely ready for committer. If so,
> then we still need support for stats tracking and explain output for
> parallel nodes.

I've looked at the code a bit more deeply, and the implementation
means the optimization applies to parallel scans also. I've also
convinced myself that the change in explain.c will cover both
non-parallel and parallel plans.

Since that's the only question I saw, and the patch seems pretty
uncontroversial/not requiring any real design choices, I've gone ahead
and marked this patch as ready for committer.

Thanks for working on this!

James

Re: nbtree: assertion failure in _bt_killitems() for posting tuple

2020-03-19 Thread Peter Geoghegan

On Thu, Mar 19, 2020 at 9:34 AM Anastasia Lubennikova
 wrote:
> Unfortunately I cannot attach test and core dump, since they rely on the
> enterprise multimaster extension code.
> Here are some details from the core dump, that I find essential:
>
> Stack is
> _bt_killitems
> _bt_release_current_position
> _bt_release_scan_state
> btrescan
> index_rescan
> RelationFindReplTupleByIndex
>
> (gdb) p offnum
> $3 = 125
> (gdb) p *item
> $4 = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200}
> (gdb) p *kitem
> $5 = {heapTid = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200},
> indexOffset = 121, tupleOffset = 32639}
>
>
> Unless I miss something, this assertion must be removed.

Is this index an unlogged index, under the hood?

-- 
Peter Geoghegan

Re: Missing errcode() in ereport

2020-03-19 Thread Tom Lane

Andres Freund  writes:
> On 2020-03-19 14:07:04 -0400, Tom Lane wrote:
>> Now that we can rely on having varargs macros, I think we could
>> stop requiring the extra level of parentheses,

> I think that'd be an improvement, because:
> ane of the ones I saw confuse people is just:
> /home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: error: 
> ‘ereport’ undeclared (first use in this function); did you mean ‘rresvport’?
>  3727 |ereport(FATAL,
>   |^~~
>   |rresvport
> /home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: note: each 
> undeclared identifier is reported only once for each function it appears in
> because the extra parens haven't been added.

Ah, so the macro isn't expanded at all if it appears to have more than
two parameters?  That seems odd, but I suppose some compilers might
work that way.  Switching to varargs would improve that for sure.

> Another one I've both seen and committed byself is converting an elog to
> an ereport, and not adding an errcode() around the error code - which
> silently looks like it works.

You mean not adding errmsg() around the error string?  Yeah, I can
believe that one easily.  It's sort of the same problem as forgetting
to wrap errcode() around the ERRCODE_ constant.

I think that at least some compilers will complain about side-effect-free
subexpressions of a comma expression.  Could we restructure things so
that the errcode/errmsg/etc calls form a standalone comma expression
rather than appearing to be arguments of a varargs function?  The
compiler's got no hope of realizing there's anything wrong when it
thinks it's passing an integer or string constant to a function it knows
nothing about.  But if it could see that nothing at all is being done with
the constant, maybe we'd get somewhere.

regards, tom lane

Re: Missing errcode() in ereport

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 14:07:04 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-03-17 10:09:18 -0400, Tom Lane wrote:
> >> We might want to spend some effort thinking how to find or prevent
> >> additional bugs of the same ilk ...
> 
> > Yea, that'd be good.  Trying to help people new to postgres write their
> > first patches I found that ereport is very confusing to them - largely
> > because the syntax doesn't make much sense. Made worse by the compiler
> > error messages being terrible in many cases.
> 
> > Not sure there's much we can do without changing ereport's "signature"
> > though :(
> 
> Now that we can rely on having varargs macros, I think we could
> stop requiring the extra level of parentheses, ie instead of
> 
> ereport(ERROR,
> (errcode(ERRCODE_DIVISION_BY_ZERO),
>  errmsg("division by zero")));
> 
> it could be just
> 
> ereport(ERROR,
> errcode(ERRCODE_DIVISION_BY_ZERO),
> errmsg("division by zero"));
> 
> (The old syntax had better still work, of course.  I'm not advocating
> running around and changing existing calls.)

I think that'd be an improvement, because:

> I'm not sure that this'd really move the goalposts much in terms of making
> it any less confusing, but possibly it would improve the compiler errors?
> Do you have any concrete examples of crummy error messages?

ane of the ones I saw confuse people is just:
/home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: error: 
‘ereport’ undeclared (first use in this function); did you mean ‘rresvport’?
 3727 |ereport(FATAL,
  |^~~
  |rresvport
/home/andres/src/postgresql/src/backend/tcop/postgres.c:3727:4: note: each 
undeclared identifier is reported only once for each function it appears in

because the extra parens haven't been added.

I personally actually hit a variant of that on a semi-regular basis:
Closing the parens for "rest" early, as there's just so many parens in
ereports, especially if an errmsg argument is a function call
itself. Which leads to a variation of the above error message.  I know
how to address it, obviously, but I do find it somewhat annoying to deal
with.

Another one I've both seen and committed byself is converting an elog to
an ereport, and not adding an errcode() around the error code - which
silently looks like it works.

Greetings,

Andres Freund

Why does [auto-]vacuum delay not report a wait event?

2020-03-19 Thread Andres Freund

Hi,

I was looking at [1], wanting to suggest a query to monitor what
autovacuum is mostly waiting on. Partially to figure out whether it's
mostly autovacuum cost limiting.

But uh, unfortunately the vacuum delay code just sleeps without setting
a wait event:

void
vacuum_delay_point(void)
{
...
/* Nap if appropriate */
if (msec > 0)
{
if (msec > VacuumCostDelay * 4)
msec = VacuumCostDelay * 4;

pg_usleep((long) (msec * 1000));


Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
class?

Given how frequently we run into trouble with [auto]vacuum throttling
being a problem, and there not being any way to monitor that currently,
that seems like it'd be a significant improvement, given the effort?


It'd probably also be helpful to report the total time [auto]vacuum
spent being delayed for vacuum verbose/autovacuum logging, but imo
that'd be a parallel feature to a wait event, not a replacement.


Greetings,

Andres Freund

[1] 
https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io%2BrtnAUFnaJJVS4g%40mail.gmail.com

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-20 01:11:23 +0300, Darafei "Komяpa" Praliaskouski wrote:
> > > According to my reckoning, that is the remaining objection to the patch
> > > as it is (with ordinary freezing behavior).
> > >
> > > How about a scale_factor od 0.005?  That will be high enough for large
> > > tables, which seem to be the main concern here.
> >
> > Seems low on a first blush. On a large-ish table with 1 billion tuples,
> > we'd vacuum every 5 million inserts. For many ETL workloads this will
> > result in a vacuum after every bulk operation. Potentially with an index
> > scan associated (even if there's no errors, a lot of bulk loads use ON
> > CONFLICT INSERT leading to the occasional update).
> 
> This is a good and wanted thing.

I don't think that's true in general. As proposed this can increase the
overall amount of IO (both reads and writes) due to vacuum by a *LOT*.

> Upthread it was already suggested that "everyone knows to vacuum after
> bulk operations". This will go and vacuum the data while it's hot and
> in caches, not afterwards, reading from disk.

For many bulk load cases the data will not be in cache, in particular not
when individual bulk inserts are more than a few gigabytes.

> The problem hit by Mandrill is simple: in modern cloud environments
> it's sometimes simply impossible to read all the data on disk because
> of different kinds of throttling.

Yes. Which is one of the reasons why this has the potential to cause
serious issues. The proposed changes very often will *increase* the
total amount of IO. A good fraction of the time that will be "hidden" by
caching, but it'll be far from all the time.

> At some point your production database just shuts down and asks to
> VACUUM in single user mode for 40 days.

That basically has nothing to do with what we're talking about here. The
default wraparound trigger is 200 million xids, and shutdowns start at
more than 2 billion xids. If an anti-wrap autovacuum can't finish within
2 billion rows, then this won't be addressed by vacuuming more
frequently (including more frequent index scans, causing a lot more
IO!).

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 20:47:40 +0100, Laurenz Albe wrote:
> On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote:
> > I've attached a small fix which I'd like to apply to your v8 patch.
> > With that, and pending one final look, I'd like to push this during my
> > Monday (New Zealand time).  So if anyone strongly objects to that,
> > please state their case before then.

I am doubtful it should be committed with the current settings. See below.

> From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
> From: Laurenz Albe 
> Date: Thu, 19 Mar 2020 20:26:43 +0100
> Subject: [PATCH] Autovacuum tables that have received only inserts
>
> Add "autovacuum_vacuum_insert_threshold" and
> "autovacuum_vacuum_insert_scale_factor" GUC and reloption.
> The default value for the threshold is 1000;
> the scale factor defaults to 0.01.
>
> Any table that has received more inserts since it was
> last vacuumed (and that is not vacuumed for another
> reason) will be autovacuumed.
>
> This avoids the known problem that insert-only tables
> are never autovacuumed until they need to have their
> anti-wraparound autovacuum, which then can be massive
> and disruptive.

Shouldn't this also mention index only scans? IMO that's at least as big
a problem as the "large vacuum" problem.

> +  xreflabel="autovacuum_vacuum_insert_threshold">
> +  autovacuum_vacuum_insert_threshold 
> (integer)
> +  
> +   
> autovacuum_vacuum_insert_threshold
> +   configuration parameter
> +  
> +  
> +  
> +   
> +Specifies the number of inserted tuples needed to trigger a
> +VACUUM in any one table.
> +The default is 1000 tuples.
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line;
> +but the setting can be overridden for individual tables by
> +changing table storage parameters.
> +   
> +  
> + 
> +
>xreflabel="autovacuum_analyze_threshold">
>autovacuum_analyze_threshold 
> (integer)
>
> @@ -7342,6 +7362,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' 
> WITH csv;
>
>   
>
> +  xreflabel="autovacuum_vacuum_insert_scale_factor">
> +  autovacuum_vacuum_insert_scale_factor 
> (floating point)
> +  
> +   
> autovacuum_vacuum_insert_scale_factor
> +   configuration parameter
> +  
> +  
> +  
> +   
> +Specifies a fraction of the table size to add to
> +autovacuum_vacuum_insert_threshold
> +when deciding whether to trigger a VACUUM.
> +The default is 0.01 (1% of table size).
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line;
> +but the setting can be overridden for individual tables by
> +changing table storage parameters.
> +   
> +  
> + 
> +

I am *VERY* doubtful that the attempt of using a large threshold, and a
tiny scale factor, is going to work out well. I'm not confident enough
in my gut feeling to full throatedly object, but confident enough that
I'd immediately change it on any important database I operated.

Independent of how large a constant you set the threshold to, for
databases with substantially bigger tables this will lead to [near]
constant vacuuming. As soon as you hit 1 billion rows - which isn't
actually that much - this is equivalent to setting
autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
that can be a sensible setting, but I don't think anybody would suggest
it as a default.

After thinking about it for a while, I think it's fundamentally flawed
to use large constant thresholds to avoid unnecessary vacuums. It's easy
to see cases where it's bad for common databases of today, but it'll be
much worse a few years down the line where common table sizes have grown
by a magnitude or two. Nor do they address the difference between tables
of a certain size with e.g. 2kb wide rows, and a same sized table with
28 byte wide rows.  The point of constant thresholds imo can only be to
avoid unnecessary work at the *small* (even tiny) end, not the opposite.

I think there's too much "reinventing" autovacuum scheduling in a
"local" insert-only manner happening in this thread. And as far as I can
tell additionally only looking at a somewhat narrow slice of insert only
workloads.

I, again, strongly suggest using much more conservative values here. And
then try to address the shortcomings - like not freezing aggressively
enough - in separate patches (and by now separate releases, in all
likelihood).

This will have a huge impact on a lot of postgres
installations. Autovacuum already is perceived as one of the biggest
issues around postgres. If the ratio of cases where these changes
improve things to the cases it regresses isn't huge, it'll be painful
(silent improvements are obviously less noticed than breakages).

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Komяpa

> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
>
> Seems low on a first blush. On a large-ish table with 1 billion tuples,
> we'd vacuum every 5 million inserts. For many ETL workloads this will
> result in a vacuum after every bulk operation. Potentially with an index
> scan associated (even if there's no errors, a lot of bulk loads use ON
> CONFLICT INSERT leading to the occasional update).

This is a good and wanted thing. Upthread it was already suggested
that "everyone knows to vacuum after bulk operations". This will go and vacuum
the data while it's hot and in caches, not afterwards, reading from disk.


> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
>
> This whole chain of arguments seems like it actually has little to do
> with vacuuming insert only/mostly tables. The same problem exists for
> tables that aren't insert only/mostly. Instead it IMO is an argument for
> a general change in logic about when to freeze.
>
> What exactly is it that you want to achieve by having anti-wrap vacuums
> be quicker? If the goal is to reduce the window in which autovacuums
> aren't automatically cancelled when there's a conflicting lock request,
> or in which autovacuum just schedules based on xid age, then you can't
> have wraparound vacuums needing to do substantial amount of work.

The problem hit by Mandrill is simple: in modern cloud environments
it's sometimes simply impossible to read all the data on disk because
of different kinds of throttling.
At some point your production database just shuts down and asks to
VACUUM in single user mode for 40 days.

You want vacuum to happen long before that, preferably when the data
is still in RAM, or, at least, fits your cloud provider's disk burst
performance budget, where performance of block device resembles that
of an SSD and not of a Floppy Disk.

Some more reading on how that works:
https://aws.amazon.com/ru/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Fwd: invalid byte sequence for encoding "UTF8": 0x95-while using PGP Encryption -PostgreSQL

2020-03-19 Thread Chaitanya bodlapati

Hi,

I was working on Asymmetric encryption in postgres using pgcrypto . I have
generated the keys and stored in data folder and had  inserted the data
using pgcrypto encrypt function .

here the problem comes, I was trying to decrypt the data but it was
throwing me the below error

ERROR:  invalid byte sequence for encoding "UTF8": 0x95

Please find the below process which I followed

Generated the keys :
CREATE EXTENSION pgcrypto;

$ gpg --list-keys
/home/ec2-user/.gnupg/pubring.gpg


pub   2048R/8GGGFF 2020-03-19
uid   [ultimate] postgres
sub   2048R/GGGFF7 2020-03-19

create table users(username varchar(100),id integer,ssn bytea);

postgres=# INSERT INTO users
VALUES('randomname',7,pgp_pub_encrypt('434-88-8880',dearmor(pg_read_file('keys/public.key';

INSERT 0 1

postgres=# SELECT
pgp_pub_decrypt(ssn,dearmor(pg_read_file('keys/private.key'))) AS mydata
FROM users;

ERROR:  invalid byte sequence for encoding "UTF8": 0x95

postgres=# show client_encoding;
 client_encoding
-
 UTF8
(1 row)

postgres=# show server_encoding;
 server_encoding
-
 UTF8
(1 row)

Can anyone please help me on this , I am not sure why I was getting this
error.

Re: Add PostgreSQL home page to --help output

2020-03-19 Thread Daniel Gustafsson

> On 19 Mar 2020, at 22:32, Bruce Momjian  wrote:
> 
> bOn Mon, Mar 16, 2020 at 09:10:25PM -0300, Alvaro Herrera wrote:
>> On 2020-Mar-16, Bruce Momjian wrote:
>> 
>>> On Mon, Mar 16, 2020 at 05:55:26PM -0400, Bruce Momjian wrote:
Report bugs mailto:pgsql-b...@lists.postgresql.org
PostgreSQL home page https://www.postgresql.org/
 
 I kind of prefer the last one since the can both be pasted directly into
 a browser.
>>> 
>>> Actually, I prefer:
>>> 
>>> Report bugs  mailto:pgsql-b...@lists.postgresql.org
>>> PostgreSQL website  https://www.postgresql.org/
>> 
>> Hmm, pasting mailto into the browser address bar doesn't work for me ...
>> it just goes to the lists.postgresql.org website (Brave) or sits there
>> doing nothing (Firefox).  I was excited there for a minute.
>> 
>> If we're talking personal preference, I like the current output.
> 
> Well, in Firefox it knows to use Thunderbird to send email because under
> Firefox's Preferences/General/Applications, 'mailto' is set to "Use
> Thunderbird", though it can be set to other applications.  If no one
> likes my changes, I guess we will just stick with what we have.

I don't think mailto: URLs is a battle we can win, pasting it into Safari for
example yields this error message:

"This website has been blocked from automatically composing an email."

It also assumes that users will paste the bugreport email into something that
parses URLs and not straight into the "To:" field of their email client.  I'm
not sure that assumption holds.

cheers ./daniel

Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding

2020-03-19 Thread Tom Lane

Noah Misch  writes:
> On Sun, Jan 05, 2020 at 01:33:55AM +0100, Tomas Vondra wrote:
>> It's a bit unfortunate that a patch for a data corruption / loss issue
>> (even if a low-probability one) fell through multiple commitfests.

> Thanks for investing in steps to fix that.

Yeah, this patch has been waiting in the queue for way too long :-(.

I spent some time studying this, and I have a few comments/questions:

1. It seems like this discussion is conflating two different issues.
The original complaint was about a seemingly-bogus log message "could
not truncate directory "pg_xact": apparent wraparound".  Your theory
about that, IIUC, is that SimpleLruTruncate's initial round-back of
cutoffPage to a segment boundary moved us from a state where
shared->latest_page_number doesn't logically precede the cutoffPage to
one where it does, triggering the message.  So removing the initial
round-back, and then doing whatever's needful to compensate for that in
the later processing, is a reasonable fix to prevent the bogus warning.
However, you're also discussing whether or not an SLRU segment file that
is close to the wraparound boundary should get removed or not.  As far
as I can see that's 100% independent of issuance of the log message, no?
This might not affect the code substance of the patch at all; but it
seems like we need to be clear about it in our discussion, and maybe the
comments need to change too.

2. I wonder whether we have an issue even with rounding back to the
SLRU page boundary, as is done by each caller before we ever get to
SimpleLruTruncate.  I'm pretty sure that the actual anti-wraparound
protections are all precise to the XID level, so that there's some
daylight between what SimpleLruTruncate is told and the range of
data that the higher-level code thinks is of interest.  Maybe we
should restructure things so that we keep the original cutoff XID
(or equivalent) all the way through, and compare the start/end
positions of a segment file directly to that.

3. It feels like the proposed test of cutoff position against both
ends of a segment is a roundabout way of fixing the problem.  I
wonder whether we ought not pass *both* the cutoff and the current
endpoint (latest_page_number) down to the truncation logic, and
have it compare against both of those values.

To try to clarify this in my head, I thought about an image of
the modular XID space as an octagon, where each side would correspond to
a segment file if we chose numbers such that there were only 8 possible
segment files.  Let's say that nextXID is somewhere in the bottommost
octagon segment.  The oldest possible value for the truncation cutoff
XID is a bit less than "halfway around" from nextXID; so it could be
in the topmost octagon segment, if the minimum permitted daylight-
till-wraparound is less than the SLRU segment size (which it is).
Then, if we round the cutoff XID "back" to a segment boundary, most
of the current (bottommost) segment is now less than halfway around
from that cutoff point, and in particular the current segment's
starting page is exactly halfway around.  Because of the way that
TransactionIdPrecedes works, the current segment will be considered to
precede that cutoff point (the int32 difference comes out as exactly
2^31 which is negative).  Boom, data loss, because we'll decide the
current segment is removable.

I think that your proposed patch does fix this, but I'm not quite
sold that the corner cases (where the cutoff XID is itself exactly
at a page boundary) are right.  In any case, I think it'd be more
robust to be comparing explicitly against a notion of the latest
in-use page number, instead of backing into it from an assumption
that the cutoff XID itself is less than halfway around.

I wonder if we ought to dodge the problem by having a higher minimum
value of the required daylight-before-wraparound, so that the cutoff
point couldn't be in the diametrically-opposite-current segment but
would have to be at least one segment before that.  In the end,
I believe that all of this logic was written under an assumption
that we should never get into a situation where we are so close
to the wraparound threshold that considerations like these would
manifest.  Maybe we can get it right, but I don't have huge
faith in it.

It also bothers me that some of the callers of SimpleLruTruncate
have explicit one-count backoffs of the cutoff point and others
do not.  There's no obvious reason for the difference, so I wonder
if that isn't something we should have across-the-board, or else
adjust SimpleLruTruncate to do the equivalent thing internally.

I haven't thought much yet about your second point about race
conditions arising from nextXID possibly moving before we
finish the deletion scan.  Maybe we could integrate a fix for
that issue, along the lines of (1) see an SLRU segment file,
(2) determine that it appears to precede the cutoff XID, if so
(3) acquire the control lock and fetch latest_page_number,
compare against

Re: Improve errors when setting incorrect bounds for SSL protocols

2020-03-19 Thread Daniel Gustafsson

> On 7 Feb 2020, at 01:33, Michael Paquier  wrote:
> 
> On Thu, Feb 06, 2020 at 11:30:40PM +0100, Daniel Gustafsson wrote:
>> Or change to the v1 patch in this thread, which avoids the problem by doing 
>> it
>> in the OpenSSL code.  It's a shame to have generic TLS functionality be 
>> OpenSSL
>> specific when everything else TLS has been abstracted, but not working is
>> clearly a worse option.
> 
> The v1 would work just fine considering that, as the code would be
> invoked in a context where all GUCs are already loaded.  That's too
> late before the release though, so I have reverted 41aadee, and
> attached is a new patch to consider with improvements compared to v1
> mainly in the error messages.

Having gone back to look at this, I can't think of a better way to implement
this and I think we should go ahead with the proposed patch.

In this message we aren't quoting the TLS protocol setting:
+  (errmsg("%s setting %s not supported by this build",
..but in this detail we are:
+   errdetail("\"%s\" cannot be higher than \"%s\"",
Perhaps we should be consistent across all ereports?

Marking as ready for committer.

cheers ./daniel

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > I don't think a default scale factor of 0 is going to be ok. For
> > large-ish tables this will basically cause permanent vacuums. And it'll
> > sometimes trigger for tables that actually coped well so far. 10 million
> > rows could be a few seconds, not more.
> > 
> > I don't think that the argument that otherwise a table might not get
> > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > 
> > a) if that's indeed the argument, we should increase the default
> >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> >   the main argument against that from before isn't valid anymore.
> > 
> > b) there's not really a good arguments for vacuuming more often than
> >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> >   enough to allow IOS for new data, and you're not preventing
> >   anti-wraparound vacuums from happening.
> 
> According to my reckoning, that is the remaining objection to the patch
> as it is (with ordinary freezing behavior).
> 
> How about a scale_factor od 0.005?  That will be high enough for large
> tables, which seem to be the main concern here.

Seems low on a first blush. On a large-ish table with 1 billion tuples,
we'd vacuum every 5 million inserts. For many ETL workloads this will
result in a vacuum after every bulk operation. Potentially with an index
scan associated (even if there's no errors, a lot of bulk loads use ON
CONFLICT INSERT leading to the occasional update).

Personally I think we should be considerably more conservative in the
first release or two. Exposing a lot of people that previously didn't
have a lot of problems to vacuuming being *massively* more aggressive,
basically permanently running on an insert only table, will be bad.

> I fully agree with your point a) - should that be part of the patch?

We can just make it a seperate patch committed shortly afterwards.

> I am not sure about b).  In my mind, the objective is not to prevent
> anti-wraparound vacuums, but to see that they have less work to do,
> because previous autovacuum runs already have frozen anything older than
> vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> to freeze during any run would be at most one fourth of today's number
> when we hit autovacuum_freeze_max_age.

This whole chain of arguments seems like it actually has little to do
with vacuuming insert only/mostly tables. The same problem exists for
tables that aren't insert only/mostly. Instead it IMO is an argument for
a general change in logic about when to freeze.

What exactly is it that you want to achieve by having anti-wrap vacuums
be quicker? If the goal is to reduce the window in which autovacuums
aren't automatically cancelled when there's a conflicting lock request,
or in which autovacuum just schedules based on xid age, then you can't
have wraparound vacuums needing to do substantial amount of work.

Except for not auto-cancelling, and the autovac scheduling issue,
there's really nothing magic about anti-wrap vacuums.

If the goal is to avoid redundant writes, then it's largely unrelated to
anti-wrap vacuums, and can to a large degree addressed by
opportunistically freezing (best even during hot pruning!).

I am more and more convinced that it's a seriously bad idea to tie
committing "autovacuum after inserts" to also committing a change in
logic around freezing. That's not to say we shouldn't try to address
both this cycle, but discussing them as if they really are one item
makes it both more likely that we get nothing in, and more likely that
we miss the larger picture.

> I am still sorry to see more proactive freezing go, which would
> reduce the impact for truly insert-only tables.
> After sleeping on it, here is one last idea.
> 
> Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
> for those parts of the table that will receive updates or deletes.

IMO it's not at all just those regions that are potentially negatively
affected:
If there are no other modifications to the page, more aggressively
freezing can lead to seriously increased write volume. Its quite normal
to have databases where data in insert only tables *never* gets old
enough to need to be frozen (either because xid usage is low, or because
older partitions are dropped).  If data in an insert-only table isn't
write-only, the hint bits are likely to already be set, which means that
vacuum will just cause the entire table to be written another time,
without a reason.

I don't see how it's ok to substantially regress this very common
workload. IMO this basically means that more aggressively and
non-opportunistically freezing simply is a no-go (be it for insert or
other causes for vacuuming).

What am I missing?

Greetings,

Andres Freund

Re: Add PostgreSQL home page to --help output

2020-03-19 Thread Bruce Momjian

bOn Mon, Mar 16, 2020 at 09:10:25PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-16, Bruce Momjian wrote:
> 
> > On Mon, Mar 16, 2020 at 05:55:26PM -0400, Bruce Momjian wrote:
> > >   Report bugs mailto:pgsql-b...@lists.postgresql.org
> > >   PostgreSQL home page https://www.postgresql.org/
> > > 
> > > I kind of prefer the last one since the can both be pasted directly into
> > > a browser.
> > 
> > Actually, I prefer:
> > 
> > Report bugs  mailto:pgsql-b...@lists.postgresql.org
> > PostgreSQL website  https://www.postgresql.org/
> 
> Hmm, pasting mailto into the browser address bar doesn't work for me ...
> it just goes to the lists.postgresql.org website (Brave) or sits there
> doing nothing (Firefox).  I was excited there for a minute.
> 
> If we're talking personal preference, I like the current output.

Well, in Firefox it knows to use Thunderbird to send email because under
Firefox's Preferences/General/Applications, 'mailto' is set to "Use
Thunderbird", though it can be set to other applications.  If no one
likes my changes, I guess we will just stick with what we have.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Auxiliary Processes and MyAuxProc

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 11:35:41 +0100, Peter Eisentraut wrote:
> On 2020-03-18 17:07, Mike Palmiotto wrote:
> > On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> >  wrote:
> > >
> > > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby  
> > > wrote:
> > > > Also, I saw this was failing tests both before and after my rebase.
> > > >
> > > > http://cfbot.cputube.org/
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> > >
> > > Good catch, thanks. Will address this as well in the next round. Just
> > > need to set up a Windows dev environment to see if I can
> > > reproduce/fix.
> >
> > While I track this down, here is a rebased patchset, which drops
> > MySubprocessType and makes use of the MyBackendType.
>
> While working on (My)BackendType, I was attempting to get rid of
> (My)AuxProcType altogether.  This would mostly work except that it's sort of
> wired into the pgstats subsystem (see NumBackendStatSlots). This can
> probably be reorganized, but I didn't pursue it further.

Hm. Why does the number of stat slots prevent dropping AuxProcType?

> Now, I'm a sucker for refactoring, but I feel this proposal is going into a
> direction I don't understand.  I'd prefer that we focus around building out
> background workers as the preferred subprocess mechanism. Building out a
> second generic mechanism, again, I don't understand the direction.  Are we
> hoping to add more of these processes?  Make it extensible?  The net lines
> added/removed by this patch series seems pretty neutral.  What are we
> getting at the end of this?

I think background workers for internal processes are the *wrong*
direction to go. They were used as a shortcut for parallelism, and then
that was extended for logical replication. In my opinion that was done,
to a significant degree, because the aux proc stuff is/was too painful
to deal with, but that's something that should be fixed, instead of
building more and more parallel infrastructure.

Bgworkers are imo not actually a very good fit for internal
processes. We have to be able to guarantee that there's a free "slot" to
start internal processes, we should be able to efficiently reference
their pids (especially from postmaster, but also other processes), we
want to precisely know which shared PGPROC is being used, etc.

We now have somewhat different systems for at least: non-shmem
postmaster children, aux processes, autovacuum workers, internal
bgworkers, extension bgworkers. That's just insane.

We should merge those as much as possible. There's obviously going to be
some differences, but it needs to be less than now.  I think we're
mostly on the same page on that, I just don't see bgworkers getting us
there.

The worst part about the current situation imo is that there's way too
many places that one needs to modify / check to create / understand a
process. Moving towards having a single c file w/ associated header that
defines 95% of that seems like a good direction.  I've not looked at
recent versions of the patch, but there was some movement towards that
in earlier versions.

On a green field, I would say that there should be one or two C
arrays-of-structs defining subprocesses. And most behaviour should be
channeled through that.

struct PgProcessType
{
const char* name;
PgProcessBeforeShmem before_shmem;
PgProcessEntry entrypoint;
uint8:1 only_one_exists;
uint8:1 uses_shared_memory;
uint8:1 client_connected;
uint8:1 database_connected;
...
};

PgProcessType ProcessTypes[] = {
 [StartupType] = {
 .name = "startup",
 .entrypoint = StartupProcessMain,
 .only_one_exists = true,
 .uses_shared_memory = true,
 .client_connected = false,
 .database_connected = false,
 ...
 },
 ...
 [UserBackendType] = {
 .name = "backend",
 .before_shmem = BackendInitialize,
 .entrypoint = BackendRun, // fixme
 .only_one_exists = false,
 .uses_shared_memory = true,
 .client_connected = true,
 .database_connected = true,
 ...
 }
 ...

Then there should be a single startup routine for all postmaster
children. Since most of the startup is actually shared between all the
different types of processes, we can declutter it a lot.

When starting a process postmaster would just specify the process type,
and if relevant, an argument (struct Port for backends, whatever
relevant for bgworkers etc) . Generic code should handle all the work
until the process type entry point - and likely we should move more work
from the individual process types into generic code.

If a process is 'only_one_exists' (to be renamed), the generic code
would also (in postmaster) register the pid as
subprocess_pids[type] = pid;
which would make it easy to only have per-type code in the few locations
that need to be aware, instead of many locations in
postmaster.c.   Perhaps also some shared memory location.

Coming back to earth,

Re: range_agg

2020-03-19 Thread Paul A Jungwirth

On Thu, Mar 19, 2020 at 1:42 PM Alvaro Herrera  wrote:
>
> On 2020-Mar-16, Paul A Jungwirth wrote:
>
> > On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth
> >  wrote:
> > > I think that should fix the cfbot failure.
> >
> > I saw this patch was failing to apply again. There was some
> > refactoring to how polymorphic types are determined. I added my
> > changes for anymultirange to that new approach, and things should be
> > passing again.
>
> There's been another flurry of commits in the polymorphic types area.
> Can you please rebase again?

I noticed that too. :-) I'm about halfway through a rebase right now.
I can probably finish it up tonight.

Paul

Re: range_agg

2020-03-19 Thread Alvaro Herrera

On 2020-Mar-16, Paul A Jungwirth wrote:

> On Sat, Mar 14, 2020 at 11:13 AM Paul A Jungwirth
>  wrote:
> > I think that should fix the cfbot failure.
> 
> I saw this patch was failing to apply again. There was some
> refactoring to how polymorphic types are determined. I added my
> changes for anymultirange to that new approach, and things should be
> passing again.

There's been another flurry of commits in the polymorphic types area.
Can you please rebase again?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Fwd: invalid byte sequence for encoding "UTF8": 0x95-while using PGP Encryption -PostgreSQL

2020-03-19 Thread Chaitanya bodlapati

Hi,

I was working on Asymmetric encryption in postgres using pgcrypto . I have
generated the keys and stored in data folder and had  inserted the data
using pgcrypto encrypt function .

here the problem comes, I was trying to decrypt the data but it was
throwing me the below error

ERROR:  invalid byte sequence for encoding "UTF8": 0x95

Please find the below process which I followed

Generated the keys :
CREATE EXTENSION pgcrypto;

$ gpg --list-keys
/home/ec2-user/.gnupg/pubring.gpg


pub   2048R/8GGGFF 2020-03-19
uid   [ultimate] postgres
sub   2048R/GGGFF7 2020-03-19

create table users(username varchar(100),id integer,ssn bytea);

postgres=# INSERT INTO users
VALUES('randomname',7,pgp_pub_encrypt('434-88-8880',dearmor(pg_read_file('keys/public.key';

INSERT 0 1

postgres=# SELECT
pgp_pub_decrypt(ssn,dearmor(pg_read_file('keys/private.key'))) AS mydata
FROM users;

ERROR:  invalid byte sequence for encoding "UTF8": 0x95

postgres=# show client_encoding;
 client_encoding
-
 UTF8
(1 row)

postgres=# show server_encoding;
 server_encoding
-
 UTF8
(1 row)

Can anyone please help me on this , I am not sure why I was getting this
error.

Re: error context for vacuum to include block number

2020-03-19 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 03:18:32PM +0530, Amit Kapila wrote:
> > You're right.  PHASE_SCAN_HEAP was set, but only inside a conditional.
> 
> I think if we do it inside for loop, then we don't need to set it
> conditionally at multiple places.  I have changed like that in the
> attached patch, see if that makes sense to you.

Yes, makes sense, and it's right near pgstat_progress_update_param, which is
nice.

> > Both those issues are due to a change in the most recent patch.  In the
> > previous patch, the PHASE_VACUUM_HEAP was set only by lazy_vacuum_heap(), 
> > and I
> > moved it recently to vacuum_page.  But it needs to be copied, as you point 
> > out.
> >
> > That's unfortunate due to a lack of symmetry: lazy_vacuum_page does its own
> > progress update, which suggests to me that it should also set its own error
> > callback.  It'd be nicer if EITHER the calling functions did that 
> > (scan_heap()
> > and vacuum_heap()) or if it was sufficient for the called function
> > (vacuum_page()) to do it.
> 
> Right, but adding in callers will spread at multiple places.
> 
> I have made a few additional changes in the attached. (a) Removed
> VACUUM_ERRCB_PHASE_VACUUM_FSM as I think we have to add it at many
> places, you seem to have added for FreeSpaceMapVacuumRange() but not
> for RecordPageWithFreeSpace(), (b) Reset the phase to
> VACUUM_ERRCB_PHASE_UNKNOWN after finishing the work for a particular
> phase, so that the new phase shouldn't continue in the callers.
> 
> I have another idea to make (b) better.  How about if a call to
> update_vacuum_error_cbarg returns information of old phase (blkno,
> phase, and indname) along with what it is doing now and then once the
> work for the current phase is over it can reset it back with old phase
> information?   This way the callee after finishing the new phase work
> would be able to reset back to the old phase.  This will work
> something similar to our MemoryContextSwitchTo.

I was going to suggest that we could do that by passing in a pointer to a local
variable "LVRelStats olderrcbarg", like:
|update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
|  blkno, NULL, );

and then later call:
|update_vacuum_error_cbarg(vacrelstats, olderrcbarg.phase,
|   olderrcbarg.blkno,
|   olderrcbarg.indname,
|   NULL);

I implemented it in a separate patch, but it may be a bad idea, due to freeing
indname.  To exercise it, I tried to cause a crash by changing "else if
(errcbarg->indname)" to "if" without else, but wasn't able to cause a crash,
probably just due to having a narrow timing window.

As written, we only pfree indname if we do actually "reset" the cbarg, which is
in the two routines handling indexes.  It's probably a good idea to pass the
indname rather than the relation in any case.

I rebased the rest of my patches on top of yours.

-- 
Justin
>From a1ef4498cf93a9971be5c1683ceb62879ab9bd17 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 12 Dec 2019 20:54:37 -0600
Subject: [PATCH v27 1/5] vacuum errcontext to show block being processed

Discussion:
https://www.postgresql.org/message-id/20191120210600.gc30...@telsasoft.com
---
 src/backend/access/heap/vacuumlazy.c | 215 +++
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 191 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..92bac9a24d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -268,8 +268,19 @@ typedef struct LVParallelState
 	int			nindexes_parallel_condcleanup;
 } LVParallelState;
 
+typedef enum
+{
+	VACUUM_ERRCB_PHASE_UNKNOWN,
+	VACUUM_ERRCB_PHASE_SCAN_HEAP,
+	VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+	VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+	VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
+} errcb_phase;
+
 typedef struct LVRelStats
 {
+	char	   *relnamespace;
+	char	   *relname;
 	/* useindex = true means two-pass strategy; false means one-pass */
 	bool		useindex;
 	/* Overall statistics about rel */
@@ -290,8 +301,12 @@ typedef struct LVRelStats
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
-} LVRelStats;
 
+	/* Used for error callback: */
+	char	   *indname;
+	BlockNumber blkno;			/* used only for heap operations */
+	errcb_phase phase;
+} LVRelStats;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -314,10 +329,10 @@ static void lazy_vacuum_all_indexes(Relation onerel, Relation *Irel,
 	LVRelStats *vacrelstats, LVParallelState *lps,
 	int nindexes);
 static void lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
-			  LVDeadTuples *dead_tuples, double reltuples);
+			  LVDeadTuples *dead_tuples, double reltuples, LVRelStats *vacrelstats);
 static

invalid byte sequence for encoding "UTF8": 0x95-while using PGP Encryption -PostgreSQL

2020-03-19 Thread Chaitanya bodlapati

Hi,

I was working on Asymmetric encryption in postgres using pgcrypto . I have
generated the keys and stored in data folder and had  inserted the data
using pgcrypto encrypt function .

here the problem comes, I was trying to decrypt the data but it was
throwing me the below error

ERROR:  invalid byte sequence for encoding "UTF8": 0x95

Please find the below process which I followed

Generated the keys :
CREATE EXTENSION pgcrypto;

$ gpg --list-keys
/home/ec2-user/.gnupg/pubring.gpg


pub   2048R/8GGGFF 2020-03-19
uid   [ultimate] postgres
sub   2048R/GGGFF7 2020-03-19

create table users(username varchar(100),id integer,ssn bytea);

postgres=# INSERT INTO users
VALUES('randomname',7,pgp_pub_encrypt('434-88-8880',dearmor(pg_read_file('keys/public.key';

INSERT 0 1

postgres=# SELECT
pgp_pub_decrypt(ssn,dearmor(pg_read_file('keys/private.key'))) AS mydata
FROM users;

ERROR:  invalid byte sequence for encoding "UTF8": 0x95

postgres=# show client_encoding;
 client_encoding
-
 UTF8
(1 row)

postgres=# show server_encoding;
 server_encoding
-
 UTF8
(1 row)

Can anyone please help me on this , I am not sure why I was getting this
error.

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Jeff Davis

On Thu, 2020-03-19 at 16:04 -0400, Robert Haas wrote:
> Other people may have different concerns, but that was the only thing
> that was bothering me.

OK, thank you for raising it.

Perhaps we can re-fix the issue for HashAgg if necessary, or I can
tweak some accounting things within HashAgg itself, or both.

Regards,
Jeff Davis

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Jeff Davis

On Thu, 2020-03-19 at 15:33 -0400, Robert Haas wrote:
> On Thu, Mar 19, 2020 at 3:27 PM Jeff Davis  wrote:
> > I think omitting the tail of the current block is an unqualified
> > improvement for the purpose of obeying work_mem, regardless of the
> > OS.
> > The block sizes keep doubling up to 8MB, and it doesn't make a lot
> > of
> > sense to count that last large mostly-empty block against work_mem.
> 
> Well, again, my point is that whether or not it counts depends on
> your
> system's overcommit behavior. Depending on how you have the
> configured, or what your OS likes to do, it may in reality count or
> not count. Or so I believe. Am I wrong?

I don't believe it should not be counted for the purposes of work_mem.

Let's say that the OS eagerly allocates it, then what is the algorithm
supposed to do in response? It can either:

1. just accept that all of the space is used, even though it's
potentially as low as 50% used, and spill earlier than may be
necessary; or

2. find a way to measure the free space, and somehow predict whether
that space will be reused the next time a group is added to the hash
table

It just doesn't seem reasonable for me for the algorithm to change its
behavior based on these large block allocations. It may be valuable
information for other purposes (like tuning your OS, or tracking down
memory issues), though.

Regards,
Jeff Davis

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Robert Haas

On Thu, Mar 19, 2020 at 3:44 PM Jeff Davis  wrote:
> On Thu, 2020-03-19 at 15:26 -0400, Robert Haas wrote:
> > Well, the issue is, if I understand correctly, that this means that
> > MemoryContextStats() might now report a smaller amount of memory than
> > what we actually allocated from the operating system. That seems like
> > it might lead someone trying to figure out where a backend is leaking
> > memory to erroneous conclusions.
>
> MemoryContextStats() was unaffected by my now-reverted change.

Oh. Well, that addresses my concern, then. If this only affects the
accounting for memory-bounded hash aggregation and nothing else is
going to be touched, including MemoryContextStats(), then it's not an
issue for me.

Other people may have different concerns, but that was the only thing
that was bothering me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: shared-memory based stats collector

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 16:51:59 +1300, Thomas Munro wrote:
> On Fri, Mar 13, 2020 at 4:13 PM Andres Freund  wrote:
> > Thomas, could you look at the first two patches here, and my review
> > questions?
> 
> Ack.

Thanks!


> > >   dsa_pointer item_pointer = hash_table->buckets[i];
> > > @@ -549,9 +560,14 @@ dshash_delete_entry(dshash_table *hash_table, void 
> > > *entry)
> > >   
> > > LW_EXCLUSIVE));
> > >
> > >   delete_item(hash_table, item);
> > > - hash_table->find_locked = false;
> > > - hash_table->find_exclusively_locked = false;
> > > - LWLockRelease(PARTITION_LOCK(hash_table, partition));
> > > +
> > > + /* We need to keep partition lock while sequential scan */
> > > + if (!hash_table->seqscan_running)
> > > + {
> > > + hash_table->find_locked = false;
> > > + hash_table->find_exclusively_locked = false;
> > > + LWLockRelease(PARTITION_LOCK(hash_table, partition));
> > > + }
> > >  }
> >
> > This seems like a failure prone API.
> 
> If I understand correctly, the only purpose of the seqscan_running
> variable is to control that behaviour ^^^.  That is, to make
> dshash_delete_entry() keep the partition lock if you delete an entry
> while doing a seq scan.  Why not get rid of that, and provide a
> separate interface for deleting while scanning?
> dshash_seq_delete(dshash_seq_status *scan, void *entry).  I suppose it
> would be most common to want to delete the "current" item in the seq
> scan, but it could allow you to delete anything in the same partition,
> or any entry if using the "consistent" mode.  Oh, I see that Andres
> said the same thing later.


> > [Andres complaining about comments and language stuff]
> 
> I would be happy to proof read and maybe extend the comments (writing
> new comments will also help me understand and review the code!), and
> maybe some code changes to move this forward.  Horiguchi-san, are you
> working on another version now?  If so I'll wait for it before I do
> that.

Cool! Being ESL myself and mildly dyslexic to boot, that'd be
helpful. But I'd hold off for a moment, because I think there'll need to
be some open heart surgery on this patch (see bottom of my last email in
this thread, for minutes ago (don't yet have a message id, sorry)).


> > The fact that you're locking the per-database entry unconditionally once
> > for each table almost guarantees contention - and you're not using the
> > 'conditional lock' approach for that. I don't understand.
> 
> Right, I also noticed that:
> 
> /*
>  * Local table stats should be applied to both dbentry and tabentry at
>  * once. Update dbentry only if we could update tabentry.
>  */
> if (pgstat_update_tabentry(tabhash, entry, nowait))
> {
> pgstat_update_dbentry(dbent, entry);
> updated = true;
> }
> 
> So pgstat_update_tabentry() goes to great trouble to take locks
> conditionally, but then pgstat_update_dbentry() immediately does:
> 
> LWLockAcquire(>lock, LW_EXCLUSIVE);
> dbentry->n_tuples_returned += stat->t_counts.t_tuples_returned;
> dbentry->n_tuples_fetched += stat->t_counts.t_tuples_fetched;
> dbentry->n_tuples_inserted += stat->t_counts.t_tuples_inserted;
> dbentry->n_tuples_updated += stat->t_counts.t_tuples_updated;
> dbentry->n_tuples_deleted += stat->t_counts.t_tuples_deleted;
> dbentry->n_blocks_fetched += stat->t_counts.t_blocks_fetched;
> dbentry->n_blocks_hit += stat->t_counts.t_blocks_hit;
> LWLockRelease(>lock);
> 
> Why can't we be "lazy" with the dbentry stats too?  Is it really
> important for the table stats and DB stats to agree with each other?

We *need* to be lazy here, I think.


> Hmm.  Even if you change the above code use a conditional lock, I am
> wondering (admittedly entirely without data) if this approach is still
> too clunky: even trying and failing to acquire the lock creates
> contention, just a bit less.  I wonder if it would make sense to make
> readers do more work, so that writers can avoid contention.  For
> example, maybe PgStat_StatDBEntry could hold an array of N sets of
> counters, and readers have to add them all up.  An advanced version of
> this idea would use a reasonably fresh copy of something like
> sched_getcpu() and numa_node_of_cpu() to select a partition to
> minimise contention and cross-node traffic, with a portable fallback
> based on PID or something.  CPU core/node awareness is something I
> haven't looked into too seriously, but it's been on my mind to solve
> some other problems.

I don't think we really need that for the per-object stats. The easier
way to address that is to instead reduce the rate of flushing to the
shared table. There's not really a problem with the shared state of the
stats lagging by a few hundred ms or so.

The amount of code complexity a scheme like you describe doesn't seem
worth it to me without very clear

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Jeff Davis

On Mon, 2020-03-16 at 11:45 -0700, Jeff Davis wrote:
> Although that's technically correct, the purpose of
> MemoryContextMemAllocated() is to give a "real" usage number so we
> know
> when we're out of work_mem and need to spill (in particular, the
> disk-
> based HashAgg work, but ideally other operators as well). This "real"
> number should include fragmentation, freed-and-not-reused chunks, and
> other overhead. But it should not include significant amounts of
> allocated-but-never-touched memory, which says more about economizing
> calls to malloc than it does about the algorithm's memory usage. 

To expand on this point:

work_mem is to keep executor algorithms somewhat constrained in the
memory that they use. With that in mind, it should reflect things that
the algorithm has some control over, and can be measured cheaply.

Therefore, we shouldn't include huge nearly-empty blocks of memory that
the system decided to allocate in response to a request for a small
chunk (the algorithm has little control). Nor should we try to walk a
list of blocks or free chunks (expensive).

We should include used memory, reasonable overhead (chunk headers,
alignment, etc.), and probably free chunks (which represent
fragmentation).

For AllocSet, the notion of "all touched memory", which is everything
except the current block's tail, seems to fit the requirements well.

Regards,
Jeff Davis

Re: shared-memory based stats collector

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 20:30:04 +0900, Kyotaro Horiguchi wrote:
> > I think we also can get rid of the dshash_delete changes, by instead
> > adding a dshash_delete_current(dshash_seq_stat *status, void *entry) API
> > or such.
>
> [009] (Fixed)
> I'm not sure about the point of having two interfaces that are hard to
> distinguish.  Maybe dshash_delete_current(dshash_seq_stat *status) is
> enough(). I also reverted the dshash_delete().

Well, dshash_delete() cannot generally safely be used together with
iteration. It has to be the current element etc. And I think the locking
changes make dshash less robust. By explicitly tying "delete the current
element" to the iterator, most of that can be avoided.



> > >  /* SIGUSR1 signal handler for archiver process */
> >
> > Hm - this currently doesn't set up a correct sigusr1 handler for a
> > shared memory backend - needs to invoke procsignal_sigusr1_handler
> > somewhere.
> >
> > We can probably just convert to using normal latches here, and remove
> > the current 'wakened' logic? That'll remove the indirection via
> > postmaster too, which is nice.
>
> [018] (Fixed, separate patch 0005)
> It seems better. I added it as a separate patch just after the patch
> that turns archiver an auxiliary process.

I don't think it's correct to do it separately, but I can just merge
that on commit.


> > > @@ -4273,6 +4276,9 @@ pgstat_get_backend_desc(BackendType backendType)
> > >
> > >   switch (backendType)
> > >   {
> > > + case B_ARCHIVER:
> > > + backendDesc = "archiver";
> > > + break;
> >
> > should imo include 'WAL' or such.
>
> [019] (Not Fixed)
> It is already named "archiver" by 8e8a0becb3. Do I rename it in this
> patch set?

Oh. No, don't rename it as part of this. Could you reply to the thread
in which Peter made that change, and reference this complaint?


> [021] (Fixed, separate patch 0007)
> However the "statistics collector process" is gone, I'm not sure
> "statistics collector" feature also is gone. But actually the word
> "collector" looks a bit odd in some context. I replaced "the results
> of statistics collector" with "the activity statistics". (I'm not sure
> "the activity statistics" is proper as a subsystem name.) The word
> "collect" is replaced with "track".  I didn't change section IDs
> corresponding to the renaming so that old links can work. I also fixed
> the tranche name for LWTRANCHE_STATS from "activity stats" to
> "activity_statistics"

Without having gone through the changes, that sounds like the correct
direction to me. There's no "collector" anymore, so removing that seems
like the right thing.


> > > diff --git a/src/backend/postmaster/pgstat.c 
> > > b/src/backend/postmaster/pgstat.c
> > > index ca5c6376e5..1ffe073a1f 100644
> > > --- a/src/backend/postmaster/pgstat.c
> > > +++ b/src/backend/postmaster/pgstat.c
> > > + *  Collects per-table and per-function usage statistics of all backends 
> > > on
> > > + *  shared memory. pg_count_*() and friends are the interface to locally 
> > > store
> > > + *  backend activities during a transaction. Then pgstat_flush_stat() is 
> > > called
> > > + *  at the end of a transaction to pulish the local stats on shared 
> > > memory.
> > >   *
> >
> > I'd rather not exhaustively list the different objects this handles -
> > it'll either be annoying to maintain, or just get out of date.
>
> [024] (Fixed, Maybe)
> Although not sure I get you correctly, I rewrote it as the follows.
>
>  *  Collects per-table and per-function usage statistics of all backends on
>  *  shared memory. The activity numbers are once stored locally, then written
>  *  to shared memory at commit time or by idle-timeout.

s/backends on/backends in/

I was thinking of something like:
 *  Collects activity statistics, e.g. per-table access statistics, of
 *  all backends in shared memory. The activity numbers are first stored
 *  locally in each process, then flushed to shared memory at commit
 *  time or by idle-timeout.



> > > - *   - Add some automatic call for pgstat vacuuming.
> > > + *  To avoid congestion on the shared memory, we update shared stats no 
> > > more
> > > + *  often than intervals of PGSTAT_STAT_MIN_INTERVAL(500ms). In the case 
> > > where
> > > + *  all the local numbers cannot be flushed immediately, we postpone 
> > > updates
> > > + *  and try the next chance after the interval of
> > > + *  PGSTAT_STAT_RETRY_INTERVAL(100ms), but we don't wait for no longer 
> > > than
> > > + *  PGSTAT_STAT_MAX_INTERVAL(1000ms).
> >
> > I'm not convinced by this backoff logic. The basic interval seems quite
> > high for something going through shared memory, and the max retry seems
> > pretty low.
>
> [025] (Not Fixed)
> Is it the matter of intervals? Is (MIN, RETRY, MAX) = (1000, 500,
> 1) reasonable?

Partially. I think for access to shared resources we want *increasing*
wait times, rather than shorter retry timeout. The goal should be to be
to make

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-19 Thread Dean Rasheed

On Wed, 18 Mar 2020 at 00:29, Tomas Vondra  wrote:
>
> OK, I took a look. I think from the correctness POV the patch is OK, but
> I think the dependencies_clauselist_selectivity() function now got a bit
> too complex. I've been able to parse it now, but I'm sure I'll have
> trouble in the future :-(
>
> Can we refactor / split it somehow and move bits of the logic to smaller
> functions, or something like that?
>

Yeah, it has gotten a bit long. It's somewhat tricky splitting it up,
because of the number of shared variables used throughout the
function, but here's an updated patch splitting it into what seemed
like the 2 most logical pieces. The first piece (still in
dependencies_clauselist_selectivity()) works out what dependencies
can/should be applied, and the second piece in a new function does the
actual work of applying the list of functional dependencies to the
clause list.

I think that has made it easier to follow, and it has also reduced the
complexity of the final "no applicable stats" branch.

> Another thing I'd like to suggest is keeping the "old" formula, and
> instead of just replacing it with
>
> P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
>
> but explaining how the old formula may produce nonsensical selectivity,
> and how the new formula addresses that issue.
>

I think this is purely a comment issue? I've added some more extensive
comments attempting to justify the formulae.

Regards,
Dean
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
new file mode 100644
index 5f9b43b..18cce60
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -30,6 +30,7 @@
 #include "utils/fmgroids.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
@@ -73,13 +74,18 @@ static double dependency_degree(int numr
 AttrNumber *dependency, VacAttrStats **stats, Bitmapset *attrs);
 static bool dependency_is_fully_matched(MVDependency *dependency,
 		Bitmapset *attnums);
-static bool dependency_implies_attribute(MVDependency *dependency,
-		 AttrNumber attnum);
 static bool dependency_is_compatible_clause(Node *clause, Index relid,
 			AttrNumber *attnum);
 static MVDependency *find_strongest_dependency(MVDependencies **dependencies,
 			   int ndependencies,
 			   Bitmapset *attnums);
+static Selectivity deps_clauselist_selectivity(PlannerInfo *root, List *clauses,
+			   int varRelid, JoinType jointype,
+			   SpecialJoinInfo *sjinfo,
+			   MVDependency **dependencies,
+			   int ndependencies,
+			   AttrNumber *list_attnums,
+			   Bitmapset **estimatedclauses);
 
 static void
 generate_dependencies_recurse(DependencyGenerator state, int index,
@@ -614,19 +620,6 @@ dependency_is_fully_matched(MVDependency
 }
 
 /*
- * dependency_implies_attribute
- *		check that the attnum matches is implied by the functional dependency
- */
-static bool
-dependency_implies_attribute(MVDependency *dependency, AttrNumber attnum)
-{
-	if (attnum == dependency->attributes[dependency->nattributes - 1])
-		return true;
-
-	return false;
-}
-
-/*
  * statext_dependencies_load
  *		Load the functional dependencies for the indicated pg_statistic_ext tuple
  */
@@ -986,6 +979,182 @@ find_strongest_dependency(MVDependencies
 }
 
 /*
+ * deps_clauselist_selectivity
+ *		Return the estimated selectivity of (a subset of) the given clauses
+ *		using the given functional dependencies.  This is the guts of
+ *		dependencies_clauselist_selectivity().
+ *
+ * This will estimate all not-already-estimated clauses that are compatible
+ * with functional dependencies, and which have an attribute mentioned by any
+ * of the given dependencies (either as an implying or implied attribute).
+ *
+ * Given (lists of) clauses on attributes (a,b) and a functional dependency
+ * (a=>b), the per-column selectivities P(a) and P(b) are notionally combined
+ * using the formula
+ *
+ *		P(a,b) = f * P(a) + (1-f) * P(a) * P(b)
+ *
+ * where 'f' is the degree of dependency.  This reflects the fact that we
+ * expect a fraction f of all rows to be consistent with the dependency
+ * (a=>b), and so have a selectivity of P(a), while the remaining rows are
+ * treated as independent.
+ *
+ * In practice, we use a slightly modified version of this formula, which uses
+ * a selectivity of Min(P(a), P(b)) for the dependent rows, since the result
+ * should obviously not exceed either column's individual selectivity.  I.e.,
+ * we actually combine selectivities using the formula
+ *
+ *		P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
+ *
+ * This can make quite a difference if the specific values matching the
+ * clauses are not consistent with the functional dependency.
+ */
+static Selectivity
+deps_clauselist_selectivity(PlannerInfo *root, List *clauses,
+			int varRelid, JoinType

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Laurenz Albe

On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote:
> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
> 
> I agree with that, however, I'd thought 0.01, just so we're still
> close to having about 100 times less work to do for huge insert-only
> tables when it comes to having to perform an anti-wraparound vacuum.

Fine with me.

> > I am still sorry to see more proactive freezing go, which would
> > reduce the impact for truly insert-only tables.
> > After sleeping on it, here is one last idea.
> >
> > Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
> > for those parts of the table that will receive updates or deletes.
> > But what if insert-triggered vacuum operates with - say -
> > one tenth of vacuum_freeze_min_age (unless explicitly overridden
> > for the table)?  That might still be high enough not to needlessly
> > freeze too many tuples that will still be modified, but it will
> > reduce the impact on insert-only tables.
> 
> I think that might be a bit too magical and may not be what some
> people want. I know that most people won't set
> autovacuum_freeze_min_age to 0 for insert-only tables, but we can at
> least throw something in the documents to mention it's a good idea,
> however, looking over the docs I'm not too sure the best place to note
> that down.

I was afraid that idea would be too cute to appeal.

> I've attached a small fix which I'd like to apply to your v8 patch.
> With that, and pending one final look, I'd like to push this during my
> Monday (New Zealand time).  So if anyone strongly objects to that,
> please state their case before then.

Thanks!

I have rolled your edits into the attached patch v9, rebased against
current master.

Yours,
Laurenz Albe
From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 19 Mar 2020 20:26:43 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000;
the scale factor defaults to 0.01.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml | 23 ---
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 198 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 70854ae298..d1ee8e53f2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7301,6 +7301,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_threshold (integer)
+  
+   autovacuum_vacuum_insert_threshold
+   configuration parameter
+  
+  
+  
+   
+Specifies the number of inserted tuples needed to trigger a
+VACUUM in any one table.
+The default is 1000 tuples.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Jeff Davis

On Thu, 2020-03-19 at 15:26 -0400, Robert Haas wrote:
> Well, the issue is, if I understand correctly, that this means that
> MemoryContextStats() might now report a smaller amount of memory than
> what we actually allocated from the operating system. That seems like
> it might lead someone trying to figure out where a backend is leaking
> memory to erroneous conclusions.

MemoryContextStats() was unaffected by my now-reverted change.

Regards,
Jeff Davis

Re: [PATCH] pg_upgrade: report the reason for failing to open the cluster version file

2020-03-19 Thread Dagfinn Ilmari Mannsåker

Bruce Momjian  writes:

> On Wed, Feb 26, 2020 at 06:32:00PM +, Dagfinn Ilmari Mannsåker wrote:
>> Tom Lane  writes:
>> 
>> > Michael Paquier  writes:
>> >> On Wed, Feb 26, 2020 at 10:06:38AM +0100, Magnus Hagander wrote:
>> >>> +1, seems like that would be a regression in value.
>> >
>> >> Having more generic messages is less work for translators, we have
>> >> PG_VERSION in the file name, and that's more complicated to translate
>> >> in both French and Japanese.  No idea about other languages.
>> >
>> > Just looking at the committed diff, it seems painfully obvious that these
>> > two messages were written by different people who weren't talking to each
>> > other.  Why aren't they more alike?  Given
>> >
>> >pg_fatal("could not open version file \"%s\": %m\n", ver_filename);
>> >
>> > (which seems fine to me), I think the second ought to be
>> >
>> >pg_fatal("could not parse version file \"%s\"\n", ver_filename);
>> 
>> Good point.  Patch attached.
>
> Patch applied, and other adjustments:

Thanks!

- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Tomas Vondra


On Thu, Mar 19, 2020 at 12:25:16PM -0700, Jeff Davis wrote:

On Thu, 2020-03-19 at 19:11 +0100, Tomas Vondra wrote:

AFAICS the 2x allocation is the worst case, because it only happens
right after allocating a new block (of twice the size), when the
"utilization" drops from 100% to 50%. But in practice the utilization
will be somewhere in between, with an average of 75%.


Sort of. Hash Agg is constantly watching the memory, so it will
typically spill right at the point where the accounting for that memory
context is off by 2X.

That's mitigated because the hash table itself (the array of
TupleHashEntryData) ends up allocated as its own block, so does not
have any waste. The total (table mem + out of line) might be close to
right if the hash table array itself is a large fraction of the data,
but I don't think that's what we want.


 And we're not
doubling the block size indefinitely - there's an upper limit, so
over
time the utilization drops less and less. So as the contexts grow,
the
discrepancy disappears. And I'd argue the smaller the context, the
less
of an issue the overcommit behavior is.


The problem is that the default work_mem is 4MB, and the doubling
behavior goes to 8MB, so it's a problem with default settings.



Yes, it's an issue for the accuracy of our accounting. What Robert was
talking about is overcommit behavior at the OS level, which I'm arguing
is unlikely to be an issue, because for low work_mem values the absolute
difference is small, and on large work_mem values it's limited by the
block size limit.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Robert Haas

On Thu, Mar 19, 2020 at 3:27 PM Jeff Davis  wrote:
> I think omitting the tail of the current block is an unqualified
> improvement for the purpose of obeying work_mem, regardless of the OS.
> The block sizes keep doubling up to 8MB, and it doesn't make a lot of
> sense to count that last large mostly-empty block against work_mem.

Well, again, my point is that whether or not it counts depends on your
system's overcommit behavior. Depending on how you have the
configured, or what your OS likes to do, it may in reality count or
not count. Or so I believe. Am I wrong?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: GSoC applicant proposal, Uday PB

2020-03-19 Thread Stephen Frost

Greetings,

* Chapman Flack (c...@anastigmatix.net) wrote:
> On 3/19/20 2:03 PM, Alexander Korotkov wrote:
> > Does your project imply any coding?  AFAIR, GSoC doesn't allow pure
> > documentation projects.
> 
> That's a good question. The idea as I proposed it is more of an
> infrastructure project, adjusting the toolchain that currently
> autogenerates the docs (along with some stylesheets/templates) so
> that a more usable web reference is generated from the existing
> documentation—and to make it capable of generating per-version
> subtrees, as the PostgreSQL manual does, rather than having the
> most recent release be the only online reference available.
> 
> I was not envisioning it as a technical-writing project to improve
> the content of the documentation. That surely wouldn't hurt, but
> isn't what I had in mind here.
> 
> I am open to withdrawing it and reposting as a Google Season of Docs
> project if that's what the community prefers, only in that case
> I wonder if it would end up attracting contributors who would be
> expecting to do some writing and copy-editing, and end up intimidated
> by the coding/infrastructure work required.

I appreciate that it might not be a great fit for GSoD either, but that
doesn't mean it fits as a GSoC project.

> So I'm not certain how it should be categorized, or whether GSoC
> rules should preclude it. Judgment call?

You could ask on the GSoC mentors list, but I feel pretty confident that
this doesn't meet the criteria to be a GSoC project, unfortunately.  The
GSoC folks are pretty clear that GSoC is for writing OSS code, not for
pulling together tools with shell scripts, or for writing documentation
or for systems administration or for other things, even if those other
things are things that an OSS project needs.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Jeff Davis

On Thu, 2020-03-19 at 11:44 -0400, Robert Haas wrote:
> Procedurally, I think that it is highly inappropriate to submit a
> patch two weeks after the start of the final CommitFest and then
> commit it just over 48 hours later without a single endorsement of
> the
> change from anyone.

Reverted.

Sorry, I misjudged this as a "supporting fix for a specific problem",
but it seems others feel it requires discussion.

> Substantively, I think that whether or not this is improvement
> depends
> considerably on how your OS handles overcommit. I do not have enough
> knowledge to know whether it will be better in general, but would
> welcome opinions from others.

I think omitting the tail of the current block is an unqualified
improvement for the purpose of obeying work_mem, regardless of the OS.
The block sizes keep doubling up to 8MB, and it doesn't make a lot of
sense to count that last large mostly-empty block against work_mem.

Regards,
Jeff Davis

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Robert Haas

On Thu, Mar 19, 2020 at 2:11 PM Tomas Vondra
 wrote:
> My understanding is that this is really just an accounting issue, where
> allocating a block would get us over the limit, which I suppose might be
> an issue with low work_mem values.

Well, the issue is, if I understand correctly, that this means that
MemoryContextStats() might now report a smaller amount of memory than
what we actually allocated from the operating system. That seems like
it might lead someone trying to figure out where a backend is leaking
memory to erroneous conclusions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Jeff Davis

On Thu, 2020-03-19 at 19:11 +0100, Tomas Vondra wrote:
> AFAICS the 2x allocation is the worst case, because it only happens
> right after allocating a new block (of twice the size), when the
> "utilization" drops from 100% to 50%. But in practice the utilization
> will be somewhere in between, with an average of 75%.

Sort of. Hash Agg is constantly watching the memory, so it will
typically spill right at the point where the accounting for that memory
context is off by 2X. 

That's mitigated because the hash table itself (the array of
TupleHashEntryData) ends up allocated as its own block, so does not
have any waste. The total (table mem + out of line) might be close to
right if the hash table array itself is a large fraction of the data,
but I don't think that's what we want.

>  And we're not
> doubling the block size indefinitely - there's an upper limit, so
> over
> time the utilization drops less and less. So as the contexts grow,
> the
> discrepancy disappears. And I'd argue the smaller the context, the
> less
> of an issue the overcommit behavior is.

The problem is that the default work_mem is 4MB, and the doubling
behavior goes to 8MB, so it's a problem with default settings.

Regards,
Jeff Davis

Re: [PATCH] pg_upgrade: report the reason for failing to open the cluster version file

2020-03-19 Thread Bruce Momjian

On Wed, Feb 26, 2020 at 06:32:00PM +, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
> 
> > Michael Paquier  writes:
> >> On Wed, Feb 26, 2020 at 10:06:38AM +0100, Magnus Hagander wrote:
> >>> +1, seems like that would be a regression in value.
> >
> >> Having more generic messages is less work for translators, we have
> >> PG_VERSION in the file name, and that's more complicated to translate
> >> in both French and Japanese.  No idea about other languages.
> >
> > Just looking at the committed diff, it seems painfully obvious that these
> > two messages were written by different people who weren't talking to each
> > other.  Why aren't they more alike?  Given
> >
> >pg_fatal("could not open version file \"%s\": %m\n", ver_filename);
> >
> > (which seems fine to me), I think the second ought to be
> >
> >pg_fatal("could not parse version file \"%s\"\n", ver_filename);
> 
> Good point.  Patch attached.

Patch applied, and other adjustments:

This patch fixes the error message in get_major_server_version()
to be "could not parse version file", and uses the full file path
name, rather than just the data directory path.

Also, commit 4109bb5de4 added the cause of the failure to the
"could not open" error message, and improved quoting.  This patch
backpatches the "could not open" cause to PG 12, where it was
first widely used, and backpatches the quoting fix in that patch
to all supported releases.

Because some of the branches are different, I am attaching the applied
multi-version patch.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
9.5:

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index da3aefca82..c59da91f9d 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -164,12 +164,12 @@ get_major_server_version(ClusterInfo *cluster)
 	snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION",
 			 cluster->pgdata);
 	if ((version_fd = fopen(ver_filename, "r")) == NULL)
-		pg_fatal("could not open version file: %s\n", ver_filename);
+		pg_fatal("could not open version file \"%s\"\n", ver_filename);
 
 	if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 ||
 		sscanf(cluster->major_version_str, "%d.%d", _version,
 			   _version) != 2)
-		pg_fatal("could not get version from %s\n", cluster->pgdata);
+		pg_fatal("could not parse version file \"%s\"\n", ver_filename);
 
 	fclose(version_fd);
 
9.6:

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 1cd606a847..862a50c254 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -165,12 +165,12 @@ get_major_server_version(ClusterInfo *cluster)
 	snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION",
 			 cluster->pgdata);
 	if ((version_fd = fopen(ver_filename, "r")) == NULL)
-		pg_fatal("could not open version file: %s\n", ver_filename);
+		pg_fatal("could not open version file \"%s\"\n", ver_filename);
 
 	if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 ||
 		sscanf(cluster->major_version_str, "%d.%d", _version,
 			   _version) != 2)
-		pg_fatal("could not get version from %s\n", cluster->pgdata);
+		pg_fatal("could not parse version file \"%s\"\n", ver_filename);
 
 	fclose(version_fd);
 
10:

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index 5563a5020b..abcb4b176b 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -165,11 +165,11 @@ get_major_server_version(ClusterInfo *cluster)
 	snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION",
 			 cluster->pgdata);
 	if ((version_fd = fopen(ver_filename, "r")) == NULL)
-		pg_fatal("could not open version file: %s\n", ver_filename);
+		pg_fatal("could not open version file \"%s\"\n", ver_filename);
 
 	if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 ||
 		sscanf(cluster->major_version_str, "%d.%d", , ) < 1)
-		pg_fatal("could not parse PG_VERSION file from %s\n", cluster->pgdata);
+		pg_fatal("could not parse version file \"%s\"\n", ver_filename);
 
 	fclose(version_fd);
 
11:

diff --git a/src/bin/pg_upgrade/server.c b/src/bin/pg_upgrade/server.c
index fccc21836a..d2391137a8 100644
--- a/src/bin/pg_upgrade/server.c
+++ b/src/bin/pg_upgrade/server.c
@@ -165,11 +165,11 @@ get_major_server_version(ClusterInfo *cluster)
 	snprintf(ver_filename, sizeof(ver_filename), "%s/PG_VERSION",
 			 cluster->pgdata);
 	if ((version_fd = fopen(ver_filename, "r")) == NULL)
-		pg_fatal("could not open version file: %s\n", ver_filename);
+		pg_fatal("could not open version file \"%s\"\n", ver_filename);
 
 	if (fscanf(version_fd, "%63s", cluster->major_version_str) == 0 ||
 		sscanf(cluster->major_version_str, "%d.%d", , ) < 1)
-		pg_fatal("could not parse

Re: GSoC applicant proposal, Uday PB

2020-03-19 Thread Chapman Flack

On 3/19/20 2:03 PM, Alexander Korotkov wrote:

> Does your project imply any coding?  AFAIR, GSoC doesn't allow pure
> documentation projects.

That's a good question. The idea as I proposed it is more of an
infrastructure project, adjusting the toolchain that currently
autogenerates the docs (along with some stylesheets/templates) so
that a more usable web reference is generated from the existing
documentation—and to make it capable of generating per-version
subtrees, as the PostgreSQL manual does, rather than having the
most recent release be the only online reference available.

I was not envisioning it as a technical-writing project to improve
the content of the documentation. That surely wouldn't hurt, but
isn't what I had in mind here.

I am open to withdrawing it and reposting as a Google Season of Docs
project if that's what the community prefers, only in that case
I wonder if it would end up attracting contributors who would be
expecting to do some writing and copy-editing, and end up intimidated
by the coding/infrastructure work required.

So I'm not certain how it should be categorized, or whether GSoC
rules should preclude it. Judgment call?

Regards,
-Chap

Re: Collation versioning

2020-03-19 Thread Julien Rouhaud

On Thu, Mar 19, 2020 at 12:31:54PM +0900, Michael Paquier wrote:
> On Wed, Mar 18, 2020 at 04:35:43PM +0100, Julien Rouhaud wrote:
> > On Wed, Mar 18, 2020 at 09:56:40AM +0100, Julien Rouhaud wrote:
> > AFAICT it was only missing a call to index_update_collation_versions() in
> > ReindexRelationConcurrently.  I added regression tests to make sure that
> > REINDEX, REINDEX [INDEX|TABLE] CONCURRENTLY and VACUUM FULL are doing what's
> > expected.
> 
> If you add a call to index_update_collation_versions(), the old and
> invalid index will use the same refobjversion as the new index, which
> is the latest collation version of the system, no?  If the operation
> is interrupted before the invalid index is dropped, then we would keep
> a confusing value for refobjversion, because the old invalid index
> does not rely on the new collation version, but on the old one.
> Hence, it seems to me that it would be correct to have the old invalid
> index either use an empty version string to say "we don't know"
> because the index is invalid anyway, or keep a reference to the old
> collation version intact.  I think that the latter is much more useful
> for debugging issues when upgrading a subset of indexes if the
> operation is interrupted for a reason or another.

Indeed, I confused the _ccold and _ccnew indexes.  So, the root cause is phase
4, more precisely the dependency swap in index_concurrently_swap.

A possible fix would be to teach changeDependenciesOf() to preserve the
dependency version.  It'd be quite bit costly as this would mean an extra index
search for each dependency row found.  We could probably skip the lookup if the
row have a NULL recorded version, as version should either be null or non null
for both objects.

I'm wondering if that's a good time to make changeDependenciesOf and
changeDependenciesOn private, and instead expose a swapDependencies(classid,
obj1, obj2) that would call both, as preserving the version doesn't really
makes sense outside a switch.  It's als oa good way to ensure that no CCI is
performed in the middle.

> > Given discussion in nearby threads, I obviously can't add tests for failed
> > REINDEX CONCURRENTLY, so here's what's happening with a manual repro:
> > 
> > =# UPDATE pg_depend SET refobjversion = 'meh' WHERE refobjversion = 
> > '153.97';
> > UPDATE 1
> 
> Updates to catalogs are not an existing practice in the core
> regression tests, so patches had better not do that. :p

I already heavily relied on that in the previous version of the patchset.  The
only possible alternative would be to switch to TAP tests, and constantly
restart the instance in binary upgrade mode to be able to call
binary_upgrade_set_index_coll_version.  I'd prefer to avoid that if that's
possible, as it'll make the test way more complex and quite unreadable.

> > =# REINDEX TABLE CONCURRENTLY t1 ;
> > LOCATION:  ReindexRelationConcurrently, indexcmds.c:2839
> > ^CCancel request sent
> > ERROR:  57014: canceling statement due to user request
> > LOCATION:  ProcessInterrupts, postgres.c:3171
> 
> I guess that you used a second session here beginning a transaction
> before REINDEX CONCURRENTLY ran here so as it would stop after
> swapping dependencies, right?

Yes, sorry for eluding that.  I'm using a SELECT FOR UPDATE, same scenario as
the recent issue with TOAST tables with REINDEX CONCURRENTLY.

Re: Additional improvements to extended statistics

2020-03-19 Thread Dean Rasheed

On Wed, 18 Mar 2020 at 19:31, Tomas Vondra  wrote:
>
> Attached is a rebased patch series, addressing both those issues.
>
> I've been wondering why none of the regression tests failed because of
> the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to
> make the tests stable, all the MCV lists we use are "perfect" i.e. it
> represents 100% of the data. But this selectivity is used to compute
> selectivity only for the part not represented by the MCV list, i.e. it's
> not really used. I suppose we could add a test that would use larger
> MCV item, but I'm afraid that'd be inherently unstable :-(
>

I think it ought to be possible to write stable tests for this code
branch -- I think all you need is for the number of rows to remain
small, so that the stats sample every row and are predictable, while
the MCVs don't cover all values, which can be achieved with a mix of
some common values, and some that only occur once.

I haven't tried it, but it seems like it would be possible in principle.

> Another thing I was thinking about is the changes to the API. We need to
> pass information whether the clauses are connected by AND or OR to a
> number of places, and 0001 does that in two ways. For some functions it
> adds a new parameter (called is_or), and for other functiosn it creates
> a new copy of a function. So for example
>
>- statext_mcv_clauselist_selectivity
>- statext_clauselist_selectivity
>
> got the new flag, while e.g. clauselist_selectivity gets a new "copy"
> sibling called clauselist_selectivity_or.
>
> There were two reasons for not using flag. First, clauselist_selectivity
> and similar functions have to do very different stuff for these two
> cases, so it'd be just one huge if/else block. Second, minimizing
> breakage of third-party code - pretty much all the extensions I've seen
> only work with AND clauses, and call clauselist_selectivity. Adding a
> flag would break that code. (Also, there's a bit of laziness, because
> this was the simplest thing to do during development.)
>
> But I wonder if that's sufficient reason - maybe we should just add the
> flag in all cases. It might break some code, but the fix is trivial (add
> a false there).
>
> Opinions?
>

-1

I think of clause_selectivity() and clauselist_selectivity() as the
public API that everyone is using, whilst the functions that support
lists of clauses to be combined using OR are internal (to the planner)
implementation details. I think callers of public API tend to either
have implicitly AND'ed list of clauses, or a single OR clause.

Regards,
Dean

Re: Adding missing object access hook invocations

2020-03-19 Thread Mark Dilger



> On Mar 19, 2020, at 11:30 AM, Mark Dilger  
> wrote:
> 
>  Will post v3 shortly.



v3-0001-Adding-missing-Object-Access-hook-invocations.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Some improvements to numeric sqrt() and ln()

2020-03-19 Thread Tom Lane

Dean Rasheed  writes:
> On Wed, 4 Mar 2020 at 14:41, David Steele  wrote:
>> Are these improvements targeted at PG13 or PG14?  This seems a pretty
>> big change for the last CF of PG13.

> Well of course that's not entirely up to me, but I was hoping to
> commit it for PG13.

> It's very well covered by a large number of regression tests in both
> numeric.sql and numeric_big.sql, since nearly anything that calls
> ln(), log() or pow() ends up going through sqrt_var(). Also, the
> changes are local to functions in numeric.c, which makes them easy to
> revert if something proves to be wrong.

FWIW, I agree that this is a reasonable thing to consider committing
for v13.  It's not adding any new user-visible behavior, so there's
no definitional issues to quibble over, which is usually what I worry
about regretting after an overly-hasty commit.  And it's only touching
a few functions in one file, so even if the patch is a bit long, the
complexity seems pretty well controlled.

I've not read the patch in detail so this isn't meant as a review,
but from a process standpoint I see no reason not to go forward.

regards, tom lane

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 17:21:38 +0900, Fujii Masao wrote:
> Pushed! Thanks!

FWIW, I'm a bit doubtful that incuring the overhead of this by default
on everybody is a nice thing. On filesystems with high latency and with
a lot of small relations the overhead of stating a lot of files can be
almost as high as the actual base backup.

Greetings,

Andres Freund

Re: Adding missing object access hook invocations

2020-03-19 Thread Mark Dilger




> On Mar 19, 2020, at 11:17 AM, Alvaro Herrera  wrote:
> 
> On 2020-Mar-18, Mark Dilger wrote:
> 
>> Here is the latest patch.
> 
> So you insist in keeping the Drop hook calls?

My apologies, not at all.  I appear to have attached the wrong patch.  Will 
post v3 shortly.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Adding missing object access hook invocations

2020-03-19 Thread Alvaro Herrera

On 2020-Mar-18, Mark Dilger wrote:

> Here is the latest patch.

So you insist in keeping the Drop hook calls?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Tomas Vondra


On Thu, Mar 19, 2020 at 11:44:05AM -0400, Robert Haas wrote:

On Mon, Mar 16, 2020 at 2:45 PM Jeff Davis  wrote:

Attached is a patch that makes mem_allocated a method (rather than a
field) of MemoryContext, and allows each memory context type to track
the memory its own way. They all do the same thing as before
(increment/decrement a field), but AllocSet also subtracts out the free
space in the current block. For Slab and Generation, we could do
something similar, but it's not as much of a problem because there's no
doubling of the allocation size.

Although I think this still matches the word "allocation" in spirit,
it's not technically correct, so feel free to suggest a new name for
MemoryContextMemAllocated().


Procedurally, I think that it is highly inappropriate to submit a
patch two weeks after the start of the final CommitFest and then
commit it just over 48 hours later without a single endorsement of the
change from anyone.



True.


Substantively, I think that whether or not this is improvement depends
considerably on how your OS handles overcommit. I do not have enough
knowledge to know whether it will be better in general, but would
welcome opinions from others.



Not sure overcommit is a major factor, and if it is then maybe it's the
strategy of doubling block size that's causing problems.

AFAICS the 2x allocation is the worst case, because it only happens
right after allocating a new block (of twice the size), when the
"utilization" drops from 100% to 50%. But in practice the utilization
will be somewhere in between, with an average of 75%. And we're not
doubling the block size indefinitely - there's an upper limit, so over
time the utilization drops less and less. So as the contexts grow, the
discrepancy disappears. And I'd argue the smaller the context, the less
of an issue the overcommit behavior is.

My understanding is that this is really just an accounting issue, where
allocating a block would get us over the limit, which I suppose might be
an issue with low work_mem values.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: GSoC applicant proposal, Uday PB

2020-03-19 Thread Stephen Frost

Greetings,

* Alexander Korotkov (a.korot...@postgrespro.ru) wrote:
> On Wed, Mar 18, 2020 at 8:13 AM p.b uday  wrote:
> > Hi PostgreSQL team,
> > I am looking forward to participating in the GSoC with PostgreSQL this 
> > summer. Below is my draft proposal for your review. Any feedback would be 
> > greatly appreciated.
> >
> > PL/Java online documentation improvements
> 
> Does your project imply any coding?  AFAIR, GSoC doesn't allow pure
> documentation projects.

Yeah, I was just sending an email to Chapman regarding that.

There is a "Google Season of Docs" that happened last year in the fall
that this would be more appropriate for.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Missing errcode() in ereport

2020-03-19 Thread Tom Lane

Andres Freund  writes:
> On 2020-03-17 10:09:18 -0400, Tom Lane wrote:
>> We might want to spend some effort thinking how to find or prevent
>> additional bugs of the same ilk ...

> Yea, that'd be good.  Trying to help people new to postgres write their
> first patches I found that ereport is very confusing to them - largely
> because the syntax doesn't make much sense. Made worse by the compiler
> error messages being terrible in many cases.

> Not sure there's much we can do without changing ereport's "signature"
> though :(

Now that we can rely on having varargs macros, I think we could
stop requiring the extra level of parentheses, ie instead of

ereport(ERROR,
(errcode(ERRCODE_DIVISION_BY_ZERO),
 errmsg("division by zero")));

it could be just

ereport(ERROR,
errcode(ERRCODE_DIVISION_BY_ZERO),
errmsg("division by zero"));

(The old syntax had better still work, of course.  I'm not advocating
running around and changing existing calls.)

I'm not sure that this'd really move the goalposts much in terms of making
it any less confusing, but possibly it would improve the compiler errors?
Do you have any concrete examples of crummy error messages?

regards, tom lane

Re: GSoC applicant proposal, Uday PB

2020-03-19 Thread Alexander Korotkov

Hi!

On Wed, Mar 18, 2020 at 8:13 AM p.b uday  wrote:
> Hi PostgreSQL team,
> I am looking forward to participating in the GSoC with PostgreSQL this 
> summer. Below is my draft proposal for your review. Any feedback would be 
> greatly appreciated.
>
> PL/Java online documentation improvements

Does your project imply any coding?  AFAIR, GSoC doesn't allow pure
documentation projects.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Missing errcode() in ereport

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-17 10:09:18 -0400, Tom Lane wrote:
> We might want to spend some effort thinking how to find or prevent
> additional bugs of the same ilk ...

Yea, that'd be good.  Trying to help people new to postgres write their
first patches I found that ereport is very confusing to them - largely
because the syntax doesn't make much sense. Made worse by the compiler
error messages being terrible in many cases.

Not sure there's much we can do without changing ereport's "signature"
though :(

Regards,

Andres

Re: Unicode normalization SQL functions

2020-03-19 Thread Andreas Karlsson


On 3/19/20 3:41 PM, Peter Eisentraut wrote:
What is that status of this patch set?  I think we have nailed down the 
behavior, but there were some concerns about certain performance 
characteristics.  Do people feel that those are required to be addressed 
in this cycle?


Personally I would rather see it merged if the code is correct (which it 
seems like it is from what I can tell) as the performance seems to be 
good enough for it to be useful.


Unicode normalization is a feature which I have wished and at least for 
my use cases the current implementation is more than fast enough.


Andreas

Re: Internal key management system

2020-03-19 Thread Bruce Momjian

On Fri, Mar 20, 2020 at 12:50:27AM +0900, Masahiko Sawada wrote:
> On Fri, Mar 20, 2020 at 0:35 Bruce Momjian  wrote:
> Well, the issue is if the user can control the user key, there is might be
> a way to make the user key do nothing.
> 
> Well I meant ‘USER_KEY:’ is a fixed length string for the key used for wrap 
> and
> unwrap SQL interface functions. So user cannot control it. We will have 
> another
> key derived by, for example, HKDF(MK, ‘TDE_KEY:’ || system_identifier) for
> block encryption.

OK, yes, something liek that might make sense.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Parallel grouping sets

2020-03-19 Thread Pengzhou Tang

Thanks you to review this patch.

On Thu, Mar 19, 2020 at 10:09 AM Tomas Vondra 
wrote:

> Hi,
>
> unfortunately this got a bit broken by the disk-based hash aggregation,
> committed today, and so it needs a rebase. I've started looking at the
> patch before that, and I have it rebased on e00912e11a9e (i.e. the
> commit before the one that breaks it).

I spent the day to look into the details of the hash spill patch and
finally can
successfully rebase it, I tested the first 5 patches and they all passed the
installcheck, the 0006-parallel-xxx.path is not tested yet and I also need
to
make hash spill work in the final stage of parallel grouping sets, will do
that
tomorrow.

the conflicts mainly located in the handling of hash spill for grouping
sets,
the 0004-reorganise- patch also make the refilling the hash table stage
easier and
can avoid the nullcheck in that stage.

>
Attached is the rebased patch series (now broken), with a couple of
> commits with some minor cosmetic changes I propose to make (easier than
> explaining it on a list, it's mostly about whitespace, comments etc).
> Feel free to reject the changes, it's up to you.

Thanks, I will enhance the comments and take care of the whitespace.

>
>
I'll continue doing the review, but it'd be good to have a fully rebased
> version.

Very appreciate it.

Thanks,
Pengzhou

Re: backend type in log_line_prefix?

2020-03-19 Thread Fabrízio de Royes Mello

On Sun, Mar 15, 2020 at 7:32 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
>
>
> I have committed that last one also, after some corrections.
>

IMHO we should also update file_fdw documentation. See attached!

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 28b61c8f2d..ed028e4ec9 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -261,7 +261,8 @@ CREATE FOREIGN TABLE pglog (
   query text,
   query_pos integer,
   location text,
-  application_name text
+  application_name text,
+  backend_type text
 ) SERVER pglog
 OPTIONS ( filename '/home/josh/data/log/pglog.csv', format 'csv' );

nbtree: assertion failure in _bt_killitems() for posting tuple

2020-03-19 Thread Anastasia Lubennikova

During tests, we catched an assertion failure in _bt_killitems() for 
posting tuple in unique index:


/* kitem must have matching offnum when heap TIDs match */
Assert(kitem->indexOffset == offnum);

https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtutils.c#L1809

I struggle to understand the meaning of this assertion.
Don't we allow the chance that posting tuple moved right on the page as 
the comment says?


 * We match items by heap TID before assuming they are the right ones to
 * delete.  We cope with cases where items have moved right due to 
insertions.


It seems that this is exactly the case for this failure.
We expected to find tuple at offset 121, but instead it is at offset 
125.  (see dump details below).


Unfortunately I cannot attach test and core dump, since they rely on the 
enterprise multimaster extension code.

Here are some details from the core dump, that I find essential:

Stack is
_bt_killitems
_bt_release_current_position
_bt_release_scan_state
btrescan
index_rescan
RelationFindReplTupleByIndex

(gdb) p offnum
$3 = 125
(gdb) p *item
$4 = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200}
(gdb) p *kitem
$5 = {heapTid = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200}, 
indexOffset = 121, tupleOffset = 32639}



Unless I miss something, this assertion must be removed.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-19 Thread Pavel Stehule

čt 19. 3. 2020 v 16:44 odesílatel Tom Lane  napsal:

> I wrote:
> > Here's a pretty-nearly-final version of the patch.
> > In 0001 below, I've left it throwing an error for the case of all
> > ANYCOMPATIBLE inputs being unknown, but the documentation fails to
> > acknowledge that.  0002 below is a delta patch that switches to the
> > other approach of resolving as TEXT.  I'm pretty well convinced that
> > 0002 is what we should do, so I have not bothered to write a doc
> > change that would explain 0001's behavior on this point.
>
> Pushed with the resolve-to-TEXT mod, and some last-minute
> polishing from a final re-read of the patch.
>

great, thank you very much

Pavel


> regards, tom lane
>

Re: Internal key management system

2020-03-19 Thread Masahiko Sawada

On Fri, Mar 20, 2020 at 0:35 Bruce Momjian  wrote:

> On Thu, Mar 19, 2020 at 11:42:36PM +0900, Masahiko Sawada wrote:
> > On Thu, 19 Mar 2020 at 22:00, Bruce Momjian  wrote:
> > >
> > > On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote:
> > > > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada
> > > > > I understand that your idea is to include fixed length string to
> the
> > > > > 256 bit key in order to separate key space. But if we do that, I
> think
> > > > > that the key strength would actually be the same as the strength of
> > > > > weaker key length, depending on how we have the fixed string. I
> think
> > > > > if we want to have multiple key spaces, we need to derive keys
> from the
> > > > > master key using KDF.
> > > >
> > > > Or we can simply generate a different encryption key for block
> > > > encryption. Therefore we will end up with having two encryption keys
> > > > inside database. Maybe we can discuss this after the key manager has
> > > > been introduced.
> > >
> > > I know Sehrope liked derived keys so let's get his feedback on this.
> We
> > > might want to have two keys anyway for key rotation purposes.
> > >
> >
> > Agreed. Maybe we can derive a key for the use of wrap and unwrap SQL
> > interface by like HKDF(MK, 'USER_KEY:') or HKDF(KM, 'USER_KEY:' ||
> > system_identifier).
>
> Well, the issue is if the user can control the user key, there is might be
> a way to make the user key do nothing.


Well I meant ‘USER_KEY:’ is a fixed length string for the key used for wrap
and unwrap SQL interface functions. So user cannot control it. We will have
another key derived by, for example, HKDF(MK, ‘TDE_KEY:’ ||
system_identifier) for block encryption.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-19 Thread Tom Lane

I wrote:
> Here's a pretty-nearly-final version of the patch.
> In 0001 below, I've left it throwing an error for the case of all
> ANYCOMPATIBLE inputs being unknown, but the documentation fails to
> acknowledge that.  0002 below is a delta patch that switches to the
> other approach of resolving as TEXT.  I'm pretty well convinced that
> 0002 is what we should do, so I have not bothered to write a doc
> change that would explain 0001's behavior on this point.

Pushed with the resolve-to-TEXT mod, and some last-minute
polishing from a final re-read of the patch.

regards, tom lane

Re: Make MemoryContextMemAllocated() more precise

2020-03-19 Thread Robert Haas

On Mon, Mar 16, 2020 at 2:45 PM Jeff Davis  wrote:
> Attached is a patch that makes mem_allocated a method (rather than a
> field) of MemoryContext, and allows each memory context type to track
> the memory its own way. They all do the same thing as before
> (increment/decrement a field), but AllocSet also subtracts out the free
> space in the current block. For Slab and Generation, we could do
> something similar, but it's not as much of a problem because there's no
> doubling of the allocation size.
>
> Although I think this still matches the word "allocation" in spirit,
> it's not technically correct, so feel free to suggest a new name for
> MemoryContextMemAllocated().

Procedurally, I think that it is highly inappropriate to submit a
patch two weeks after the start of the final CommitFest and then
commit it just over 48 hours later without a single endorsement of the
change from anyone.

Substantively, I think that whether or not this is improvement depends
considerably on how your OS handles overcommit. I do not have enough
knowledge to know whether it will be better in general, but would
welcome opinions from others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: potential stuck lock in SaveSlotToPath()

2020-03-19 Thread Robert Haas

On Wed, Mar 18, 2020 at 4:25 PM Andres Freund  wrote:
> I don't see a valid reason for that though - if anything it's dangerous,
> because we're not persistently saving the slot. It should fail the
> checkpoint imo. Robert, do you have an idea?

Well, the comment atop SaveSlotToPath says:

 * This needn't actually be part of a checkpoint, but it's a convenient
 * location.

And I agree with that.

Incidentally, the wait-event handling in SaveSlotToPath() doesn't look
right for the early-exit cases either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Internal key management system

2020-03-19 Thread Bruce Momjian

On Thu, Mar 19, 2020 at 11:42:36PM +0900, Masahiko Sawada wrote:
> On Thu, 19 Mar 2020 at 22:00, Bruce Momjian  wrote:
> >
> > On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote:
> > > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada
> > > > I understand that your idea is to include fixed length string to the
> > > > 256 bit key in order to separate key space. But if we do that, I think
> > > > that the key strength would actually be the same as the strength of
> > > > weaker key length, depending on how we have the fixed string. I think
> > > > if we want to have multiple key spaces, we need to derive keys from the
> > > > master key using KDF.
> > >
> > > Or we can simply generate a different encryption key for block
> > > encryption. Therefore we will end up with having two encryption keys
> > > inside database. Maybe we can discuss this after the key manager has
> > > been introduced.
> >
> > I know Sehrope liked derived keys so let's get his feedback on this.  We
> > might want to have two keys anyway for key rotation purposes.
> >
> 
> Agreed. Maybe we can derive a key for the use of wrap and unwrap SQL
> interface by like HKDF(MK, 'USER_KEY:') or HKDF(KM, 'USER_KEY:' ||
> system_identifier).

Well, the issue is if the user can control the user key, there might be
a way to make the user key do nothing.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: WAL usage calculation patch

2020-03-19 Thread Julien Rouhaud

On Thu, Mar 19, 2020 at 09:03:02PM +0900, Fujii Masao wrote:
> 
> On 2020/03/19 2:19, Julien Rouhaud wrote:
> > 
> > I'm attaching a v5 with fp records only for temp tables, so there's no risk 
> > of
> > instability.  As I previously said I'm fine with your two patches, so unless
> > you have objections on the fpi test for temp tables or the documentation
> > changes, I believe those should be ready for committer.
> 
> You added the columns into pg_stat_database, but seem to forget to
> update the document for pg_stat_database.

Ah right, I totally missed that when I tried to clean up the original POC.

> Is it really reasonable to add the columns for vacuum's WAL usage into
> pg_stat_database? I'm not sure how much the information about
> the amount of WAL generated by vacuum per database is useful.

The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:

> Isn't it better to make VACUUM VERBOSE and autovacuum log include
> that information, instead, to see how much each vacuum activity
> generates the WAL? Sorry if this discussion has already been done
> upthread.

That's a way better idea!  I'm attaching the full patchset with the 3rd patch
to use this approach instead.  There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().

Autovacuum log sample:

2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table 
"rjuju.public.t1": index scans: 0
pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 25 removed, 25 remain, 0 are dead but not yet 
removable, oldest xmin: 502
buffer usage: 4448 hits, 4 misses, 4 dirtied
avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
WAL usage: 6643 records, 4 full page records, 1402679 bytes

VACUUM log sample:

# vacuum VERBOSE t1;
INFO:  vacuuming "public.t1"
INFO:  "t1": removed 5 row versions in 443 pages
INFO:  "t1": found 5 removable, 0 nonremovable row versions in 443 out of 
443 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
There were 5 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO:  "t1": truncated 443 to 0 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  vacuuming "pg_toast.pg_toast_16385"
INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 
out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Note that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently.  I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.
>From 73c9827b4fde3830dd52e8d2ec423f05b27dbca4 Mon Sep 17 00:00:00 2001
From: Kirill Bychik 
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v6 1/3] Track WAL usage.

---
 src/backend/access/transam/xlog.c   |  8 
 src/backend/access/transam/xloginsert.c |  6 +++
 src/backend/executor/execParallel.c | 22 ++-
 src/backend/executor/instrument.c   | 51 ++---
 src/include/executor/execParallel.h |  1 +
 src/include/executor/instrument.h   | 16 +++-
 src/tools/pgindent/typedefs.list|  1 +
 7 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 793c076da6..80df3db87f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "commands/progress.h"
 #include "commands/tablespace.h"
 #include "common/controldata_utils.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;

+   /* Provide WAL update data to the instrumentation */
+   if (inserted)
+   {
+   pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+   pgWalUsage.wal_records++;
+

Re: JDBC prepared insert and X00 and SQL_ASCII

2020-03-19 Thread Dave Cramer

On Wed, 18 Mar 2020 at 08:56, gmail Vladimir Koković <
vladimir.koko...@gmail.com> wrote:

> Hi,
>
>
> After a thorough Java-Swig-libpq test, I can confirm that INSERT/SELECT is
> working properly:
> 1. server_encoding: SQL_ASCII
> 2. client_encoding: SQL_ASCII
> 3. INSERT / SELECT java string with x00
>
>
> libpq, psql - everything is OK !
>
>
> Vladimir Kokovic, DP senior (69)
> Serbia, Belgrade, March 18, 2020
> On 16.3.20. 17:04, gmail Vladimir Koković wrote:
>
> Hi,
>
> I don't know if this is a bug or the intended mode,
> but since ODBC works and JDBC does not, I would ask why JDBC prepared
> insert does not work if ODBC prepared insert works
> in case some varchar field contains 0x00 and DB is SQL_ASCII?
>
>
>
I responded on the github issue, but you cannot simply change the client
encoding for the JDBC driver. This is not allowed even though there is a
setting for allowClientEncodingChanges this is for specific purpose

*When using the V3 protocol the driver monitors changes in certain server
configuration parameters that should not be touched by end users.
The client_encoding setting is set by the driver and should not be altered.
If the driver detects a change it will abort the connection. There is one
legitimate exception to this behaviour though, using the COPY command on a
file residing on the server's filesystem. The only means of specifying the
encoding of this file is by altering the client_encoding setting. The JDBC
team considers this a failing of the COPY command and hopes to provide an
alternate means of specifying the encoding in the future, but for now there
is this URL parameter. Enable this only if you need to override the client
encoding when doing a copy.*

Dave Cramer
www.postgres.rocks

Re: [PATCH] Add schema and table names to partition error

2020-03-19 Thread Chris Bandy

On 3/18/20 11:46 PM, Amit Kapila wrote:
> On Thu, Mar 19, 2020 at 3:55 AM Chris Bandy  wrote:
>>
>>
>> Sorry for these troubles. Attached are patches created using `git
>> format-patch -n -v6` on master at 487e9861d0.
>>
> 
> No problem.  I have extracted your code changes as a separate patch
> (see attached) as I am not sure we want to add tests for these cases.

Patch looks good.

My last pitch to keep the tests: These would be the first and only
automated tests that verify errtable, errtableconstraint, etc.

> This doesn't apply in back-branches, but I think that is small work
> and we can do that if required.

It looks like the only failing hunk on REL_12_STABLE is in tablecmds.c.
The ereport is near line 5090 there. The partition code has changed
quite a bit compared the older branches. ;-)

Thanks,
Chris

Re: Internal key management system

2020-03-19 Thread Masahiko Sawada

On Thu, 19 Mar 2020 at 22:00, Bruce Momjian  wrote:
>
> On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote:
> > On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada
> > > I understand that your idea is to include fixed length string to the
> > > 256 bit key in order to separate key space. But if we do that, I think
> > > that the key strength would actually be the same as the strength of
> > > weaker key length, depending on how we have the fixed string. I think
> > > if we want to have multiple key spaces, we need to derive keys from the
> > > master key using KDF.
> >
> > Or we can simply generate a different encryption key for block
> > encryption. Therefore we will end up with having two encryption keys
> > inside database. Maybe we can discuss this after the key manager has
> > been introduced.
>
> I know Sehrope liked derived keys so let's get his feedback on this.  We
> might want to have two keys anyway for key rotation purposes.
>

Agreed. Maybe we can derive a key for the use of wrap and unwrap SQL
interface by like HKDF(MK, 'USER_KEY:') or HKDF(KM, 'USER_KEY:' ||
system_identifier).

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unicode normalization SQL functions

2020-03-19 Thread Peter Eisentraut

What is that status of this patch set?  I think we have nailed down the 
behavior, but there were some concerns about certain performance 
characteristics.  Do people feel that those are required to be addressed 
in this cycle?


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: GSoC applicant proposal, Uday PB

2020-03-19 Thread Chapman Flack

Hi,

On 3/18/20 1:11 AM, p.b uday wrote:
> summer. Below is my draft proposal for your review. Any feedback would be
> greatly appreciated.
> ...
> The goal of the project is to improve the PL/JAVA online documentation
> website in terms of appearance, usability, information showcase and
> functionality.

Thanks for your interest! There is plenty of room for the existing
documentation to be improved.

> May 4 – May 16
> 
> Familiarize with website and velocity. Start with simple CSS framework
> class to clone the original PostgreSQL documentation website styles and
> themes.

I should perhaps clarify that I don't think it's essential for the
PL/Java docs to exactly clone the styles of PostgreSQL or of another
project. I would be happy to achieve something more like a family
resemblance, as you see, for example, between postgresql.org and
jdbc.postgresql.org. There are many differences, and the shades of
blue aren't the same, but you could believe they are related projects.
PL/Java's looks very different, only because it's currently using
the default out-of-the-box Maven styles with almost zero adjustments.

The PL/Java doc sources are less like PostgreSQL's (which are
DocBook XML) and more like PgJDBC's (both use Markdown, but with
different toolchains). I think changing the markup language would
be out of scope; Markdown seems adequate, and personally I find it
less tiring to write. It would be nice to get it to generate tables
of contents based on existing headings.

Further discussion should probably migrate to the pljava-dev list
but that seems to be wedged at the moment; I'll look into that
and report back. Meanwhile let's be sparing in use of pgsql-hackers
(the final commitfest for PostgreSQL 13 is going on this month).

Regards,
-Chap

Re: adding partitioned tables to publications

2020-03-19 Thread Peter Eisentraut


On 2020-03-18 08:33, Amit Langote wrote:

By the way, I have rebased the patches, although maybe you've got your
own copies; attached.


Looking through 0002 and 0003 now.

The structure looks generally good.

In 0002, the naming of apply_handle_insert() vs. 
apply_handle_do_insert() etc. seems a bit prone to confusion.  How about 
something like apply_handle_insert_internal()?  Also, should we put each 
of those internal functions next to their internal function instead of 
in a separate group like you have it?


In apply_handle_do_insert(), the argument localslot should probably be 
remoteslot.


In apply_handle_do_delete(), the ExecOpenIndices() call was moved to a 
different location relative to the rest of the code.  That was probably 
not intended.


In 0003, you have /* TODO, use inverse lookup hashtable? */.  Is this 
something you plan to address in this cycle, or is that more for future 
generations?


0003 could use some more tests.  The one test that you adjusted just 
ensures the data goes somewhere instead of being rejected, but there are 
no tests that check whether it ends up in the right partition, whether 
cross-partition updates work etc.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 09:52:11PM +1300, David Rowley wrote:
> On Thu, 19 Mar 2020 at 19:07, Justin Pryzby  wrote:
> >
> > On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > > On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> > > > Having now played with the patch, I'll suggest that 1000 is too 
> > > > high a
> > > > threshold.  If autovacuum runs without FREEZE, I don't see why it 
> > > > couldn't be
> > > > much lower (10?) or use (0.2 * n_ins + 50) like the other 
> > > > autovacuum GUC.
> > >
> > > ISTM that the danger of regressing workloads due to suddenly repeatedly
> > > scanning huge indexes that previously were never / rarely scanned is
> > > significant (if there's a few dead tuples, otherwise most indexes will
> > > be able to skip the scan since the vacuum_cleanup_index_scale_factor
> > > introduction)).
> >
> > We could try to avoid that issue here:
> >
> > |/* If any tuples need to be deleted, perform final vacuum cycle */
> > |/* XXX put a threshold on min number of tuples here? */
> > |if (dead_tuples->num_tuples > 0)
> > |{
> > |/* Work on all the indexes, and then the heap */
> > |lazy_vacuum_all_indexes(onerel, Irel, indstats, 
> > vacrelstats,
> > |lps, 
> > nindexes);
> > |
> > |/* Remove tuples from heap */
> > |lazy_vacuum_heap(onerel, vacrelstats);
> > |}
> >
> > As you said, an insert-only table can skip scanning indexes, but an
> > insert-mostly table currently cannot.
> >
> > Maybe we could skip the final index scan if we hit the autovacuum insert
> > threshold?
> >
> > I still don't like mixing the thresholds with the behavior they imply, but
> > maybe what's needed is better docs describing all of vacuum's roles and its
> > procedure and priority in executing them.
> >
> > The dead tuples would just be cleaned up during a future vacuum, right ?  So
> > that would be less efficient, but (no surprise) there's a balance to strike 
> > and
> > that can be tuned.  I think that wouldn't be an issue for most people; the
> > worst case would be if you set high maint_work_mem, and low insert 
> > threshold,
> > and you got increased bloat.  But faster vacuum if we avoided idx scans.
> >
> > That might allow more flexibility in our discussion around default values 
> > for
> > thresholds for insert-triggered vacuum.
> 
> We went over this a bit already. The risk is that if you have an
> insert-mostly table and always trigger an auto-vacuum for inserts and
> never due to dead tuples, then you'll forego the index cleanup every
> time causing the indexes to bloat over time.

At the time, we were talking about skipping index *cleanup* phase.
Which also incurs an index scan.
>+  tab->at_params.index_cleanup = insert_only ? 
>VACOPT_TERNARY_DISABLED : VACOPT_TERNARY_DEFAULT;
We decided not to skip this, since it would allow index bloat, if vacuum were
only ever run due to inserts, and cleanup never happened.

I'm suggesting the possibility of skipping not index *cleanup* but index (and
heap) *vacuum*.  So that saves an index scan itself, and I think implies later
skipping cleanup (since no index tuples were removed).  But more importantly, I
think if we skip that during an insert-triggered vacuum, the dead heap tuples
are still there during the next vacuum instance.  So unlike index cleanup
(where we don't keep track of the total number of dead index tuples), this can
accumulate over time, and eventually trigger index+heap vacuum, and cleanup.

> I think any considerations to add some sort of threshold on dead
> tuples before cleaning the index should be considered independently.

+1, yes.  I'm hoping to anticipate and mitigate any objections and regressions
more than raise them.

-- 
Justin

Re: Auxiliary Processes and MyAuxProc

2020-03-19 Thread Mike Palmiotto

On Thu, Mar 19, 2020 at 6:35 AM Peter Eisentraut
 wrote:
>
> While working on (My)BackendType, I was attempting to get rid of
> (My)AuxProcType altogether.  This would mostly work except that it's
> sort of wired into the pgstats subsystem (see NumBackendStatSlots).
> This can probably be reorganized, but I didn't pursue it further.

This patchset has a different goal: to remove redundant startup code
and interspersed variants of fork/forkexec code so that we can
centralize the postmaster child startup.

The goal of centralizing postmaster startup stems from the desire to
be able to control the process security attributes immediately
before/after fork/exec. This is simply not possible with the existing
infrastructure, since processes are identified in Main functions,
which is too late (and again too scattered) to be able to do anything
useful.

By providing a mechanism to set child process metadata prior to
spawning the subprocess, we gain the ability to identify the process
type and thus set security attributes on that process.

In an earlier spin of the patchset, I included a fork_process hook,
which would be where an extension could set security attributes on a
process. I have since dropped the fork (as advised), but now realize
that it actually demonstrates the main motivation of the patchset.
Perhaps I should add that back in for the next version.

>
> Now, I'm a sucker for refactoring, but I feel this proposal is going
> into a direction I don't understand.  I'd prefer that we focus around
> building out background workers as the preferred subprocess mechanism.
> Building out a second generic mechanism, again, I don't understand the
> direction.  Are we hoping to add more of these processes?  Make it
> extensible?  The net lines added/removed by this patch series seems
> pretty neutral.  What are we getting at the end of this?

As stated above, the primary goal is to centralize the startup code.
One nice side-effect is the introduction of a mechanism that is now
both extensible and provides the ability to remove a lot of redundant
code. I see no reason to have 5 different variants of process forkexec
functions for the sole purpose of building up argv. This patchset
intends to get rid of such an architecture.

Note that this is not intended to be the complete product here -- it
is just a first step at swapping in and making use of a new
infrastructure. There will be follow-up work required to really get
the most out of this infrastructure. For instance, we could drop a
large portion of the SubPostmasterMain switch logic. There are a
number of other areas throughout the codebase (including the example
provided in the last commit, which changes the way we retrieve process
descriptions), that can utilize this new infrastructure to get rid of
code.

>
> More specifically, I don't agree with the wholesale renaming of
> auxiliary process to subprocess.  Besides the massive code churn, the

I'm not sure I understand the argument here. Where do you see
wholesale renaming of AuxProc to Subprocess? Subprocess is the name
for postmaster subprocesses, whereas Auxiliary Processes are a subset
of those processes.

> old naming seemed pretty good to distinguish them from background
> workers, the new naming is more ambiguous.

I do not see any conflation between Background Workers and Auxiliary
Processes in this patchset. Since Auxiliary Processes are included in
the full set of subprocesses and are delineated with a boolean:
needs_aux_proc, it seems fairly straightforward to me which
subprocesses are in fact Auxiliary Processes.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com

Re: Cache lookup errors with functions manipulation object addresses

2020-03-19 Thread Michael Paquier

On Thu, Mar 19, 2020 at 08:48:58AM +0100, Daniel Gustafsson wrote:
> Taking a look at this to see if we can achieve closure on this long-running
> patchset.  The goal of the patch is IMO a clear win for usability.
>
> The patchset applies with a bit of fuzzing and offsetting, it has tests (which
> pass) as well as the relevant documentation changes.  I agree with the 
> previous
> reviewers that the new shape of the test is better, so definitely +1 on that
> change.  There are a lot of mechanic changes in this set, but AFAICT they 
> cover
> all the bases.

Thanks.  I hope it is helpful.

> Since the patch has been through a lot of review already there isn't a lot to
> say, only a few very minor things that stood out:
> 
>   * This exports the useful functionality of regoperatorout for use
>   * in other backend modules.  The result is a palloc'd string.
> format_operator_extended has this comment, but the code can now also return
> NULL and not just a palloc'd string.
> 
> +   if (!missing_ok)
> +   {
> +   elog(ERROR, "could not find tuple for cast %u",
> +object->objectId);
> +   }
> The other (!missing_ok) blocks in this patch are not encapsulating the elog()
> with braces.
> 
> I'm switching this to ready for committer,

The new FORMAT_TYPE_* flag still makes sense to me while reading this
code once again, as well as the extensibility of the new APIs for
operators and procedures.  One doubt I still have is if we should
really change the signature of getObjectDescription() to include the
new missing_ok argument or if a new function should be introduced.
I'd rather keep only one function as, even if this is called in many
places in the backend, I cannot track down an extension using it, but
I won't go against Alvaro's will either if he thinks something
different as this is his original design and commit as of f8348ea3.
--
Michael


signature.asc
Description: PGP signature

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-03-19 Thread Daniel Gustafsson

> On 15 Jan 2020, at 00:04, Alvaro Herrera  wrote:
> 
> On 2020-Jan-14, Tom Lane wrote:
> 
>> I can't get terribly excited about persuading that test to cover this
>> trivial little bit of logic, but if you are, I won't stand in the way.
> 
> Hmm, that's a good point actually: the patch changed several places to
> inject the FOREIGN keyword, so in order to cover them all it would need
> several additional regexps, not just one.  I'm not sure that
> 002_pg_dump.pl is prepared to do that without unsightly contortions.

I agree that it doesn't seem worth holding up this patch for that, even though
it would be nice if we do add a test at some point.

> Anyway, other than that minor omission the patch seemed good to me, so I
> don't oppose Tomas pushing the version I posted yesterday.  Or I can, if
> he prefers that.

This patch still applies with some offsets and a bit of fuzz, and looking over
the patch I agree with Alvaro.

Moving this patch to Ready for Committer.

cheers ./daniel

Re: Don't try fetching future segment of a TLI.

2020-03-19 Thread Pavel Suderevsky

Hi,

I've tested patch provided by Kyotaro and do confirm it fixes the issue.
Any chance it will be merged to one of the next minor releases?

Thank you very much!

сб, 1 февр. 2020 г. в 08:31, David Steele :

> On 1/28/20 8:02 PM, Kyotaro Horiguchi wrote:
>  > At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky
>  >> Regading influence: issue is not about the large amount of WALs to
> apply
>  >> but in searching for the non-existing WALs on the remote storage,
> each such
>  >> search can take 5-10 seconds while obtaining existing WAL takes
>  >> milliseconds.
>  >
>  > Wow. I didn't know of a file system that takes that much seconds to
>  > trying non-existent files. Although I still think this is not a bug,
>  > but avoiding that actually leads to a big win on such systems.
>
> I have not tested this case but I can imagine it would be slow in
> practice.  It's axiomatic that is hard to prove a negative.  With
> multi-region replication it might well take some time to be sure that
> the file *really* doesn't exist and hasn't just been lost in a single
> region.
>
>  > After a thought, I think it's safe and effectively doable to let
>  > XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
>  > TLIs.  Some garbage archive files out of the range of a timeline might
>  > be seen, for example, after reusing archive directory without clearing
>  > files.  However, fetching such garbages just to fail doesn't
>  > contribute durability or reliablity at all, I think.
>
> The patch seems sane, the trick will be testing it.
>
> Pavel, do you have an environment where you can ensure this is a
> performance benefit?
>
> Regards,
> --
> -David
> da...@pgmasters.net
>

Re: Internal key management system

2020-03-19 Thread Bruce Momjian

On Thu, Mar 19, 2020 at 06:32:57PM +0900, Masahiko Sawada wrote:
> On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada
> > I understand that your idea is to include fixed length string to the
> > 256 bit key in order to separate key space. But if we do that, I think
> > that the key strength would actually be the same as the strength of
> > weaker key length, depending on how we have the fixed string. I think
> > if we want to have multiple key spaces, we need to derive keys from the
> > master key using KDF.
> 
> Or we can simply generate a different encryption key for block
> encryption. Therefore we will end up with having two encryption keys
> inside database. Maybe we can discuss this after the key manager has
> been introduced.

I know Sehrope liked derived keys so let's get his feedback on this.  We
might want to have two keys anyway for key rotation purposes.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [Proposal] Global temporary tables

2020-03-19 Thread Prabhat Sahu

On Thu, Mar 19, 2020 at 3:51 PM wenjing.zwj 
wrote:

> postgres=# CREATE LOCAL TEMPORARY TABLE gtt1(c1 serial PRIMARY KEY, c2
> VARCHAR (50) UNIQUE NOT NULL) ON COMMIT DELETE ROWS;
> CREATE TABLE
> postgres=# CREATE LOCAL TEMPORARY TABLE gtt2(c1 integer NOT NULL, c2
> integer NOT NULL,
> postgres(# PRIMARY KEY (c1, c2),
> postgres(# FOREIGN KEY (c1) REFERENCES gtt1 (c1)) ON COMMIT PRESERVE ROWS;
> ERROR:  unsupported ON COMMIT and foreign key combination
> DETAIL:  Table "gtt2" references "gtt1", but they do not have the same ON
> COMMIT setting.
>
> postgres=# CREATE LOCAL TEMPORARY TABLE gtt3(c1 serial PRIMARY KEY, c2
> VARCHAR (50) UNIQUE NOT NULL) ON COMMIT PRESERVE ROWS;
> CREATE TABLE
> postgres=#
> postgres=# CREATE LOCAL TEMPORARY TABLE gtt4(c1 integer NOT NULL, c2
> integer NOT NULL,
> postgres(# PRIMARY KEY (c1, c2),
> postgres(# FOREIGN KEY (c1) REFERENCES gtt3 (c1)) ON COMMIT DELETE ROWS;
> CREATE TABLE
>
> The same behavior applies to the local temp table.
>
Yes, the issue is related to "local temp table".

I think, Cause of the problem is temp table with on commit delete rows is
> not good for reference tables.
> So, it the error message ”cannot reference an on commit delete rows
> temporary table.“ ?
>
No, this is not always true.
We can create GTT/"local temp table" with "ON COMMIT DELETE ROWS"  which
can references to "ON COMMIT DELETE ROWS"

Below are the 4 combinations of GTT/"local temp table" reference.
1. "ON COMMIT PRESERVE ROWS" can references to "ON COMMIT PRESERVE ROWS"
2. "ON COMMIT DELETE ROWS"   can references to "ON COMMIT PRESERVE ROWS"
3. "ON COMMIT DELETE ROWS"   can references to "ON COMMIT DELETE ROWS"
But
4. "ON COMMIT PRESERVE ROWS" fails to reference "ON COMMIT DELETE ROWS"

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com

Re: Internal key management system

2020-03-19 Thread Masahiko Sawada

On Thu, 19 Mar 2020 at 18:32, Masahiko Sawada
 wrote:
>
> On Thu, 19 Mar 2020 at 15:59, Masahiko Sawada
>  wrote:
> >
> > Sending to pgsql-hackers again.
> >
> > On Tue, 17 Mar 2020 at 03:18, Bruce Momjian
> >  wrote:
> > >
> > > On Mon, Mar 16, 2020 at 04:13:21PM +0900, Masahiko Sawada wrote:
> > > > On Thu, 12 Mar 2020 at 08:13, Bruce Momjian
> > > >  wrote:
> > > > >
> > > > > On Fri, Mar  6, 2020 at 03:31:00PM +0900, Masahiko Sawada wrote:
> > > > > > On Fri, 6 Mar 2020 at 15:25, Moon, Insung 
> > > > > >  wrote:
> > > > > > >
> > > > > > > Dear Sawada-san
> > > > > > >
> > > > > > > I don't know if my environment or email system is weird, but the 
> > > > > > > V5
> > > > > > > patch file is only include simply a changed list.
> > > > > > > and previous V4 patch file size was 64kb, but the v5 patch file 
> > > > > > > size was 2kb.
> > > > > > > Can you check it?
> > > > > > >
> > > > > >
> > > > > > Thank you! I'd attached wrong file.
> > > > >
> > > > > Looking at this thread, I wanted to make a few comments:
> > > > >
> > > > > Everyone seems to think pgcrypto need some maintenance.  Who would 
> > > > > like
> > > > > to take on that task?
> > > > >
> > > > > This feature does require openssl since all the encryption/decryption
> > > > > happen via openssl function calls.
> > > > >
> > > > > Three are three levels of encrypt here:
> > > > >
> > > > > 1.  The master key generated during initdb
> > > > >
> > > > > 2.  The passphrase to unlock the master key at boot time.  Is that
> > > > > optional or required?
> > > >
> > > > The passphrase is required if the internal kms is enabled during
> > > > initdb. Currently hashing the passphrase is also required but it could
> > > > be optional. Even if we make hashing optional, we still require
> > > > openssl to wrap and unwrap.
> > >
> > > I think openssl should be required for any of this --- that is what I
> > > was asking.
> > >
> > > > > Could the wrap functions expose the master encryption key by passing 
> > > > > in
> > > > > empty string or null?
> > > >
> > > > Currently the wrap function returns NULL if null is passed, and
> > > > doesn't expose the master encryption key even if empty string is
> > > > passed because we add random IV for each wrapping.
> > >
> > > OK, good, makes sense, but you see why I am asking?  We never want the
> > > master key to be visible.
> >
> > Understood.
> >
> > >
> > > > >  I wonder if we should create a derived key from
> > > > > the master key to use for pg_wrap/pg_unwrap, maybe by appending a 
> > > > > fixed
> > > > > string to all strings supplied to these functions.  We could create
> > > > > another derived key for use in block-level encryption, so we are sure
> > > > > the two key spaces would never overlap.
> > > >
> > > > Currently the master key is 32 bytes but you mean to add fixed string
> > > > to the master key to derive a new key?
> > >
> > > Yes, that was my idea --- make a separate keyspace for wrap/unwrap and
> > > block-level encryption.
> >
> > I understand that your idea is to include fixed length string to the
> > 256 bit key in order to separate key space. But if we do that, I think
> > that the key strength would actually be the same as the strength of
> > weaker key length, depending on how we have the fixed string. I think
> > if we want to have multiple key spaces, we need to derive keys from the
> > master key using KDF.
>
> Or we can simply generate a different encryption key for block
> encryption. Therefore we will end up with having two encryption keys
> inside database. Maybe we can discuss this after the key manager has
> been introduced.
>

Attached updated version patch. This patch incorporated the comments
and changed pg_upgrade so that we take over the master encryption key
from the old cluster to the new one if both enable key management.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


kms_v6.patch
Description: Binary data

Re: WAL usage calculation patch

2020-03-19 Thread Fujii Masao





On 2020/03/19 2:19, Julien Rouhaud wrote:

On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote:


There is a higher-level Instrumentation API that can be used with
INSTRUMENT_WAL flag to collect the wal usage information. I believe
the instrumentation is widely used in the executor code, so it should
not be a problem to colelct instrumentation information on autovacuum
worker level.

Just a recommendation/chat, though. I am happy with the way the data
is collected now. If you commit this variant, please add a TODO to
rework wal usage to common instr API.



The instrumentation is somewhat intended to be used with executor nodes, not
backend commands.  I don't see real technical reason that would prevent that,
but I prefer to keep things as-is for now, as it sound less controversial.
This is for the 3rd patch, which may not even be considered for this CF anyway.



As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.



I'm also a little bit dubious about it.  The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it.  At least keeping it for the temporary tables test
shouldn't be a problem.


Temp tables should show zero FPI WAL records, true :)

I have no objections to the patch.



I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability.  As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.


You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.

Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.
Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters

Re: plan cache overhead on plpgsql expression

2020-03-19 Thread Pavel Stehule

čt 19. 3. 2020 v 10:47 odesílatel Amit Langote 
napsal:

> Hi Pavel,
>
> Sorry it took me a while to look at this.
>
> On Tue, Feb 25, 2020 at 4:28 AM Pavel Stehule 
> wrote:
> > po 24. 2. 2020 v 18:56 odesílatel Pavel Stehule 
> napsal:
> >> But I found one issue - I don't know if this issue is related to your
> patch or plpgsql_check.
> >>
> >> plpgsql_check try to clean after it was executed - it cleans all plans.
> But some pointers on simple expressions are broken after catched exceptions.
> >>
> >> expr->plan = 0x80. Is interesting, so other fields of this expressions
> are correct.
> >
> > I am not sure, but after patching the SPI_prepare_params the current
> memory context is some short memory context.
> >
> > Can SPI_prepare_params change current memory context? It did before. But
> after patching different memory context is active.
>
> I haven't been able to see the behavior you reported.  Could you let
> me know what unexpected memory context you see in the problematic
>
case?
>

How I can detect it? Are there some steps for debugging memory context?

Pavel

>
> --
> Thank you,
> Amit
>

Re: PG v12.2 - Setting jit_above_cost is causing the server to crash

2020-03-19 Thread Sandeep Thakkar

Hi,

On Thu, Feb 27, 2020 at 6:23 PM Dave Page  wrote:

> Hi
>
> On Thu, Feb 27, 2020 at 12:41 PM Tom Lane  wrote:
>
>> Aditya Toshniwal  writes:
>> > On Mon, Feb 24, 2020 at 12:46 PM Andres Freund 
>> wrote:
>> >> This isn't reproducible here. Are you sure that you're running on a
>> >> clean installation?
>>
>> > Yes I did a fresh installation using installer provided here -
>> > https://www.enterprisedb.com/downloads/postgresql
>>
>> There is apparently something wrong with the JIT stuff in EDB's 12.2
>> build for macOS.  At least, that's the conclusion I came to after
>> off-list discussion with the submitter of bug #16264, which has pretty
>> much exactly this symptom (especially if you're seeing "signal 9"
>> reports in the postmaster log).  For him, either disabling JIT or
>> reverting to 12.1 made it go away.
>>
>
> We've been looking into this;
>
> Apple started a notarisation process some time ago, designed to mark their
> applications as conforming to various security requirements, but prior to
> Catalina it was essentially optional. When Catalina was released, they made
> notarisation for distributed software a requirement, but had the process
> issue warnings for non-compliance. As-of the end of January, those warnings
> became hard errors, so now our packages must be notarised, and for that to
> happen, must be hardened by linking with a special runtime and having
> securely time stamped signatures on every binary before being checked and
> notarised as such by Apple. Without that, users would have to disable
> security features on their systems before they could run our software.
>
> Our packages are being successfully notarised at the moment, because
> that's essentially done through a static analysis. We can (and have) added
> what Apple call an entitlement in test builds which essentially puts a flag
> in the notarisation for the product that declares that it will do JIT
> operations, however, it seems that this alone is not enough and that in
> addition to the entitlement, we also need to include the MAP_JIT flag in
> mmap() calls. See
> https://developer.apple.com/documentation/security/hardened_runtime and
> https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit
>
> We're working on trying to test a patch for that at the moment.
>
>
We have fixed the issue. To explain in brief, It was related to the
hardened runtime. Hardening the runtime can produce such issues, and
therefore Apple provides the runtime exceptions. We were previously using
an entitlement "com.apple.security.cs.disable-library-validation" for the
app bundle and then tried adding
"com.apple.security.cs.allow-unsigned-executable-memory" but still the
query would crash the server process when JIT is enabled. Later we applied
these entitlements to the PG binaries (postgres, pg_ctl and others) and the
bundles (llvmjit.so and others) which actually resolved the problem.

The updates will be released in a couple of days.

-- 
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Sandeep Thakkar

Re: [PATCH] Skip llvm bytecode generation if LLVM is missing

2020-03-19 Thread Peter Eisentraut

On 2020-03-15 02:28, Craig Ringer wrote:
On Fri, 13 Mar 2020 at 15:04, Andres Freund > wrote:

On 2020-03-13 14:08:12 +0800, Craig Ringer wrote:
 > The alternative would be to detect a missing clang and emit a
much more
 > informative error than the current one that explicitly suggests
retrying
 > with
 >
 >     make with_llvm=no
 >
 > or setting with_llvm=no in the environment.

That, that, that's what I suggested upthread?

Yes, and I still don't like it. "with_llvm" is the actual 
already-working option. I'd rather make this not randomly explode for 
users, but failing that we can just hack the Makefile in the rpms for 
EL-7 (where it's a particular mess) and rely on an error message for 
other cases.

I don't really get the problem.  with_llvm=no works, so it can be used.

Options that automatically disable things based on what is installed in 
the build environment are bad ideas.  For instance, we on purpose don't 
have configure decide anything based on whether readline is installed. 
Either you select it or you don't, there is no "auto" mode.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP/PoC for parallel backup

2020-03-19 Thread Rajkumar Raghuwanshi

Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is
not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp
location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb
tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a
text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values
('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T
/tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p "
start
waiting for server to start done
server started
[edb@localhost bin]$ ./psql postgres -p  -c "select * from
pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
  oid  |  spcname   | spcowner | spcacl | spcoptions
---++--++
  1663 | pg_default |   10 ||
 16384 | tblsp  |   10 ||
(2 rows)

[edb@localhost bin]$ ./psql testdb -p  -c "select * from testtbl";
psql: error: could not connect to server: FATAL:
 "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL:  File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access
/tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or
directory


Thanks & Regards,
Rajkumar Raghuwanshi


On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi Asif,
>
> On testing further, I found when taking backup with -R, pg_basebackup
> crashed
> this crash is not consistently reproducible.
>
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a
> text);"
> CREATE TABLE
> [edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values
> ('parallel_backup with -R recovery-conf');"
> INSERT 0 1
> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
> Segmentation fault (core dumped)
>
> stack trace looks the same as it was on earlier reported crash with
> tablespace.
> --stack trace
> [edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
> Loaded symbols for /lib64/libnss_files.so.2
> Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp
> -R'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x004099ee in worker_get_files (wstate=0xc1e458) at
> pg_basebackup.c:3175
> 3175 backupinfo->curr = fetchfile->next;
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  0x004099ee in worker_get_files (wstate=0xc1e458) at
> pg_basebackup.c:3175
> #1  0x00408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
> #2  0x003921a07aa1 in start_thread (arg=0x7f72207c0700) at
> pthread_create.c:301
> #3  0x0039212e8c4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
> (gdb)
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>> Hi Asif,
>>
>>
>>> Thanks Rajkumar. I have fixed the above issues and have rebased the
>>> patch to the latest master (b7f64c64).
>>> (V9 of the patches are attached).
>>>
>>
>> I had a further review of the patches and here are my few observations:
>>
>> 1.
>> +/*
>> + * stop_backup() - ends an online backup
>> + *
>> + * The function is called at the end of an online backup. It sends out
>> pg_control
>> + * file, optionally WAL segments and ending WAL location.
>> + */
>>
>> Comments seem out-dated.
>>
>> 2. With parallel jobs, maxrate is now not supported. Since we are now
>> asking
>> data in multiple threads throttling seems important here. Can you please
>> explain why have you disabled that?
>>
>> 3. As we are always fetching a single file and as Robert suggested, let
>> rename
>> SEND_FILES to SEND_FILE instead.
>>
>> 4. Does this work on Windows? I mean does pthread_create() work on
>> Windows?
>> I asked this as I see that pgbench has its own implementation for
>> pthread_create() for WIN32 but this patch doesn't.
>>
>> 5. Typos:
>> tablspace => tablespace
>> safly => safely
>>
>> 6. parallel_backup_run() needs some comments explaining the states it goes

Re: Wait event that should be reported while waiting for WAL archiving to finish

2020-03-19 Thread Atsushi Torikoshi

On Wed, Feb 26, 2020 at 9:19 PM Fujii Masao 
wrote:

> I have no idea about this. But I wonder how much that change
> is helpful to reduce the power consumption because waiting
> for WAL archive during the backup basically not so frequently
> happens.
>

+1.
And as far as I reviewed the patch,  I didn't find any problems.

Regards,

--
Atsushi Torikoshi

1 2 >

1 - 100 of 121 matches

Mail list logo