from:"Greg Stark"

Re: Fix search_path for all maintenance commands

2023-06-08 Thread Greg Stark

On Fri, 26 May 2023 at 19:22, Jeff Davis  wrote:
>
> Maintenance commands (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW,
> REINDEX, and VACUUM) currently run as the table owner, and as a
> SECURITY_RESTRICTED_OPERATION.
>
> I propose that we also fix the search_path to "pg_catalog, pg_temp"
> when running maintenance commands, for two reasons:
>
> 1. Make the behavior of maintenance commands more consistent because
> they'd always have the same search_path.

What exactly would this impact? Offhand... expression indexes where
the functions in the expression (which would already be schema
qualified) themselves reference other objects without schema
qualification?

So this would negatively affect someone who was using such a dangerous
function definition but was careful to always use the same search_path
on it. Perhaps someone who had created an expression index on their
own table in their own schema calling their own functions in their own
schema. As long as nobody else ever calls it that would work but this
would cause superuser to no longer be able to reindex it even if
superuser set the same search_path?

I guess that's pretty narrow and a reasonable thing to desupport.
Users could just mark those functions with search_path or schema
qualify the object references in them. Perhaps we should also be
picking up cases like that sooner so users realize they've created a
footgun for themselves?

-- 
greg

Re: Let's make PostgreSQL multi-threaded

2023-06-08 Thread Greg Stark

On Wed, 7 Jun 2023 at 18:09, Andres Freund  wrote:
> Having the same memory mapping between threads makes allows the
> hardware to share the TLB (on x86 via process context identifiers), which
> isn't realistically possible with different processes.

As a matter of historical interest Solaris actually did implement this
across different processes. It was called by the somewhat unfortunate
name "Intimate Shared Memory". I don't think Linux ever implemented
anything like it but I'm not sure.

I think this was not so much about cache hit rate but about just sheer
wasted memory in page mappings. So I guess hugepages more or less
target the same issues. But I find it interesting that they were
already running into issues like this 20 years ago -- presumably those
issues have only grown.

-- 
greg

Re: Let's make PostgreSQL multi-threaded

2023-06-06 Thread Greg Stark

On Mon, 5 Jun 2023 at 10:52, Heikki Linnakangas  wrote:
>
> I spoke with some folks at PGCon about making PostgreSQL multi-threaded,
> so that the whole server runs in a single process, with multiple
> threads. It has been discussed many times in the past, last thread on
> pgsql-hackers was back in 2017 when Konstantin made some experiments [0].
>
> I feel that there is now pretty strong consensus that it would be a good
> thing, more so than before. Lots of work to get there, and lots of
> details to be hashed out, but no objections to the idea at a high level.
>
> The purpose of this email is to make that silent consensus explicit. If
> you have objections to switching from the current multi-process
> architecture to a single-process, multi-threaded architecture, please
> speak up.

I suppose I should reiterate my comments that I gave at the time. I'm
not sure they qualify as "objections" but they're some kind of general
concern.

I think of processes and threads as fundamentally the same things,
just a slightly different API -- namely that in one memory is by
default unshared and needs to be explicitly shared and in the other
it's default shared and needs to be explicitly unshared. There are
obvious practical API differences too like how signals are handled but
those are just implementation details.

So the question is whether defaulting to shared memory or defaulting
to unshared memory is better -- and whether the implementation details
are significant enough to override that.

And my general concern was that in my experience default shared memory
leads to hugely complex and chaotic shared data structures with often
very loose rules for ownership of shared data and who is responsible
for making updates, handling errors, or releasing resources.

So all else equal I feel like having a good infrastructure for
explicitly allocating shared memory segments and managing them is
superior.

However all else is not equal. The discussion in the hallway turned to
whether we could just use pthread primitives like mutexes and
condition variables instead of our own locks -- and the point was
raised that those libraries assume these objects will be in threads of
one process not shared across completely different processes.

And that's probably not the only library we're stuck reimplementing
because of this. So the question is are these things worth taking the
risk of having data structures shared implicitly and having unclear
ownership rules?

I was going to say supporting both modes relieves that fear since it
would force that extra discipline and allow testing under the more
restrictive rule. However I don't think that will actually work. As
long as we support both modes we lose all the advantages of threads.
We still wouldn't be able to use pthreads and would still need to
provide and maintain our homegrown replacement infrastructure.

-- 
greg

Re: Avoiding another needless ERROR during nbtree page deletion

2023-06-01 Thread Greg Stark

On Mon, May 22, 2023, 12:31 Peter Geoghegan  wrote:

> On Sun, May 21, 2023 at 11:51 PM Heikki Linnakangas 
> wrote:
> > Any idea what might cause this corruption?
>
> Not really, no. As far as I know the specific case that was brought to
> my attention (that put me on the path to writing this patch) was just
> an isolated incident. The interesting detail (if any) is that it was a
> relatively recent version of Postgres (13), and that there were no
> other known problems. This means that there is a plausible remaining
> gap in the defensive checks in nbtree VACUUM on recent versions -- we
> might have expected to avoid a hard ERROR in some other way, from one
> of the earlier checks, but that didn't happen on at least one
> occasion.
>

What error would one expect to see? I did have a case where vacuum was
erroring on a btree in $previous_job.

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-13 Thread Greg Stark

On Sat, 13 May 2023 at 09:46, Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > I could see an argument for a STRICT mode which would disallow partially
> > quoted fields, although I'd like some evidence that we're dealing with a
> > real problem here. Is there really a CSV producer that produces output
> > like that you showed in your example? And if so has anyone objected to
> > them about the insanity of that?
>
> I think you'd want not just "some evidence" but "compelling evidence".
> Any such option is going to add cycles into the low-level input parser
> for COPY, which we know is a hot spot and we've expended plenty of
> sweat on.  Adding a speed penalty that will be paid by the 99.99%
> of users who don't have an issue here is going to be a hard sell.

Well I'm not sure that follows. Joel specifically claimed that an
implementation that didn't accept inputs like this would actually be
simpler and that might mean it would actually be faster.

And I don't think you have to look very hard for inputs like this --
plenty of people generate CSV files from simple templates or script
outputs that don't understand escaping quotation marks at all. Outputs
like that will be fine as long as there's no doublequotes in the
inputs but then one day someone will enter a doublequote in a form
somewhere and blammo.

So I guess the real question is whether accepting inputs with
unescapted quotes and interpreting them the way we do is really the
best interpretation. Is the user best served by a) assuming they
intended to quote part of the field and not quote part of it b) assume
they failed to escape the quotation mark or c) assume something's gone
wrong and the input is entirely untrustworthy.

-- 
greg

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-13 Thread Greg Stark

On Thu, 11 May 2023 at 16:41, Peter Geoghegan  wrote:
>
> > I say "exhaustion" or "overload" are meaningless because their meaning
> > is entirely dependent on context. It's not like memory exhaustion or
> > i/o overload where it's a finite resource and it's just the sheer
> > amount in use that matters.
>
> But transaction IDs are a finite resource, in the sense that you can
> never have more than about 2.1 billion distinct unfrozen XIDs at any
> one time. "Transaction ID exhaustion" is therefore a lot more
> descriptive of the underlying problem. It's a lot better than
> wraparound, which, as I've said, is inaccurate in two major ways:

I realize that's literally true that xids are a finite resource. But
that's not what people think of when you talk about exhausting a
finite resource.

It's not like there are 2 billion XIDs in a big pool being used and
returned and as long as you don't use too many XIDs leaving the pool
empty you're ok. When people talk about resource exhaustion they
imagine that they just need a faster machine or some other way of just
putting more XIDs in the pool so they can keep using them at a faster
rate.

I really think focusing on changing one term of art for another,
neither of which is at all meaningful without an extensive technical
explanation helps anyone. All it does is hide all the existing
explanations that are all over the internet from them.

> 1. Most cases involving xidStopLimit (or even single-user mode data
> corruption) won't involve any kind of physical integer wraparound.

Fwiw I've never actually bumped into anyone talking about integer
overflow (which isn't usually called "wraparound" anyways). And in any
case it's not a terrible misconception, it at least gives users a
reasonable model for how XID space consumption works. In fact it's not
even entirely wrong -- it's not the XID itself that's overflowing but
the difference between the XID and the frozen XID.

-- 
greg

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-12 Thread Greg Stark

On Thu, 11 May 2023 at 10:04, Joel Jacobson  wrote:
>
> The parser currently accepts quoting within an unquoted field. This can lead 
> to
> data misinterpretation when the quote is part of the field data (e.g.,
> for inches, like in the example).

I think you're thinking about it differently than the parser. I think
the parser is treating this the way, say, the shell treats quotes.
That is, it sees a quoted "I bought this for my 6" followed by an
unquoted "a laptop but it didn't fit my 8" followed by a quoted "
tablet".

So for example, in that world you might only quote commas and newlines
so you might print something like

1,2,I bought this for my "6"" laptop
" but it "didn't" fit my "8""" laptop

The actual CSV spec https://datatracker.ietf.org/doc/html/rfc4180 only
allows fully quoted or fully unquoted fields and there can only be
escaped double-doublequote characters in quoted fields and no
doublequote characters in unquoted fields.

But it also says

  Due to lack of a single specification, there are considerable
  differences among implementations.  Implementors should "be
  conservative in what you do, be liberal in what you accept from
  others" (RFC 793 [8]) when processing CSV files.  An attempt at a
  common definition can be found in Section 2.

So the real question is are there tools out there that generate
entries like this and what are their intentions?

> I think we should throw a parsing error for unescaped mid-field quotes,
> and add a COPY option like ALLOW_MIDFIELD_QUOTES for cases where mid-field
> quotes are necessary. The error message could suggest this option when it
> encounters an unescaped mid-field quote.
>
> I think the convenience of not having to use an extra option doesn't outweigh
> the risk of undetected data integrity issues.

It's also a pretty annoying experience to get a message saying "error,
turn this option on to not get an error". I get what you're saying
too, which is more of a risk depends on whether turning off the error
is really the right thing most of the time or is just causing data to
be read incorrectly.

-- 
greg

Re: smgrzeroextend clarification

2023-05-12 Thread Greg Stark

On Thu, 11 May 2023 at 05:37, Peter Eisentraut
 wrote:
>
> Maybe it was never meant that way and only works accidentally?  Maybe
> hash indexes are broken?

It's explicitly documented to be this way. And I think it has to work
this way for recovery to work.

I think the reason you and Bharath and Andres are talking past each
other is that they're thinking about how the implementation works and
you're talking about the API definition.

If you read the API definition and treat the functions as a black box
I think you're right -- those two definitions sound pretty much
equivalent to me. They both extend the file, possibly multiple blocks,
and zero fill. The only difference is that smgrextend() additionally
allows you to provide data.

-- 
greg

Re: Overhauling "Routine Vacuuming" docs, particularly its handling of freezing

2023-05-11 Thread Greg Stark

On Wed, 3 May 2023 at 18:50, Peter Geoghegan  wrote:
>
> What about "XID allocation overload"? The implication that I'm going
> for here is that the system was misconfigured, or there was otherwise
> some kind of imbalance between XID supply and demand.

Fwiw while "wraparound" has pitfalls I think changing it for a new
word isn't really helpful. Especially if it's a mostly meaningless
word like "overload" or "exhaustion". It suddenly makes every existing
doc hard to find and confusing to read.

I say "exhaustion" or "overload" are meaningless because their meaning
is entirely dependent on context. It's not like memory exhaustion or
i/o overload where it's a finite resource and it's just the sheer
amount in use that matters. One way or another the user needs to
understand that it's two numbers marching through a sequence  and the
distance between them matters.

I feel like "wraparound" while imperfect is not  any worse than any
other word. It still requires context to understand but it's context
that there are many docs online that already explain and are
googleable.

If we wanted a new word it would be "overrun" but like I say, it would
just create a new context dependent technical term that users would
need to find docs that explain and give context. I don't think that
really helps users at all

-- 
greg

Re: base backup vs. concurrent truncation

2023-05-10 Thread Greg Stark

Including the pre-truncation length in the wal record is the obviously
solid approach and I none of the below is a good substitution for it.
But

On Tue, 25 Apr 2023 at 13:30, Andres Freund  wrote:
>
> It isn't - but the alternatives aren't great either. It's not that easy to hit
> this scenario, so I think something along these lines is more palatable than
> adding a pass through the entire data directory.

Doing one pass through the entire data directory on startup before
deciding the directory is consistent doesn't sound like a crazy idea.
It's pretty easy to imagine bugs in backup software that leave out
files in the middle of tables -- some of us don't even have to
imagine...

Similarly checking for a stray next segment whenever extending a file
to maximum segment size seems like a reasonable thing to check for
too.

These kinds of checks are the kind of paranoia that catches filesystem
bugs, backup software bugs, cron jobs, etc that we don't even know to
watch for.

-- 
greg

Re: Request for comment on setting binary format output per session

2023-04-18 Thread Greg Stark

On Mon, 17 Apr 2023 at 16:22, Tom Lane  wrote:
>
> I tend to agree with the proposition that we aren't going to add new
> message types very often, as long as we're careful to make them general
> purpose.  Don't forget that adding a new message type isn't just a matter
> of writing some spec text --- there has to be code backing it up.  We
> will never introduce thousands of new message types, or if we do,
> somebody factored it wrong and put data into the type code.

Well the way I understood Robert's proposal would be that you would
set a protocol option which could be some name like
SuperDuperExtension and then later send an extended message like X
SuperDuper Extension ...

The point being not so much that it saves on message types but that it
becomes possible for the wire protocol code to recognize the message
type and know which extension's code to call back to. Presumably a
callback was registered when the option was negotiated.

> The fact that we've gotten away without adding *any* new message types
> for about twenty years suggests to me that the growth rate isn't such
> that we need sub-message-types yet.  I'd keep the structure the same
> until such time as we can't choose a plausible code value for a new
> message, and then maybe add the "x-and-subtype" convention Jeff suggests.

Fwiw I've had at least two miniprojects that would eventually have led
to protocol extensions. Like most of my projects they're not finished
but one day...

Progress reporting on queries in progress -- I had things hacked to
send the progress report in an elog but eventually it would have made
sense to put it in a dedicated message type that the client would know
the structure of the content of.

Distributed tracing -- to pass the trace span id for each query and
any other baggage. Currently people either stuff it in application_id
or in SQL comments but they're both pretty awful.

-- 
greg

Re: Direct I/O

2023-04-18 Thread Greg Stark

On Mon, 17 Apr 2023 at 17:45, Thomas Munro  wrote:
>
> Reasons: (1) There will always be a
> few file systems that refuse O_DIRECT (Linux tmpfs is one such, as we
> learned in this thread; if fails with EINVAL at open() time), and

So why wouldn't we just automatically turn it off (globally or for
that tablespace) and keep operating without it afterward?

> (2) without a page cache, you really need to size your shared_buffers
> adequately and we can't do that automatically.

Well I'm more optimistic... That may not always be impossible.
We've already added the ability to add more shared memory after
startup. We could implement the ability to add or remove shared buffer
segments after startup. And it wouldn't be crazy to imagine a kernel
interface that lets us judge whether the kernel memory pressure makes
it reasonable for us to take more shared buffers or makes it necessary
to release shared memory to the kernel. You could hack something
together using /proc/meminfo today but I imagine an interface intended
for this kind of thing would be better.

>  It's something you'd
> opt into for a dedicated database server along with other carefully
> considered settings.  It seems acceptable to me that if you set
> io_direct to a non-default setting on an unusual-for-a-database-server
> filesystem you might get errors screaming about inability to open
> files -- you'll just have to turn it back off again if it doesn't work
> for you.

If the only solution is to turn it off perhaps the server should just
turn it off? I guess the problem is that the shared_buffers might be
set assuming it would be on?

-- 
greg

Re: Temporary tables versus wraparound... again

2023-04-18 Thread Greg Stark

Hm, in an optimized build using kernel perf I see this. But I don't
know how to find what the call sites are for LWLockAcquire/Release. If
it's the locks on pgproc that would be kind of bad.

I wonder if I should be gathering horizons once in the
PrecommitActions and then just using those for every temp table
somehow. Perhaps only actually doing an update if the relfrozenxid is
actually at least vacuum_freeze_table_age old.

   3.98%  postgres LWLockAcquire
   3.51%  postgres LWLockRelease
   3.18%  postgres hash_search_with_hash_value
   2.20%  postgres DropRelationLocalBuffers
   1.80%  [kernel] check_preemption_disabled
   1.52%  postgres hash_bytes
   1.27%  postgres LockAcquireExtended
   0.97%  postgres _bt_compare
   0.95%  [kernel] kmem_cache_alloc

I still think we should be applying the vacuum warning messages to
stable and probably backpatching. I've actually heard from other users
who have faced the same surprise wraparound shutdown.

Re: Temporary tables versus wraparound... again

2023-04-17 Thread Greg Stark

On Wed, 5 Apr 2023 at 13:42, Andres Freund  wrote:

>
> Somehow it doesn't feel right to use vac_update_relstats() in
> heapam_handler.c.
>
> I also don't like that your patch references
> heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we
> shouldn't add more comments piercing tableam than necessary.

I'm really puzzled because this does look like it was in the last
patch on the mailing list archive. But it's definitely not the code I
have here. I guess I did some cleanup that I never posted, so sorry.

I've attached patches using GetOldestNonRemovableTransactinId() and it
seems to have fixed the race condition here. At least I can't
reproduce it any more.

> Not if you determine a relation specific xmin, and the relation is not a
> shared relation.
>
> ISTM that the problem here really is that you're relying on RecentXmin, rather
> than computing something more accurate. Why not use
> GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I
> don't think it'll matter compared to the cost of truncating the relation.

I am a bit nervous about the overhead here because if your transaction
touched *any* temporary tables then this gets called for *every*
temporary table with ON COMMIT DELETE. That could be a lot and it's
not obvious to users that having temporary tables will impose an
overhead even if they're not actually using them.

So I went ahead and used GetOldestNonRemovableTransactionId and tried
to do some profiling. But this is on a cassert enabled build with -O0
so it's not serious profiling. I can repeat it on a real build if it
matters. But it's been a long time since I've read gprof output. This
is for -F PreCommit_on_commit_actions so the percentages are as a
percent of just the precommit cleanup:

index % timeself  childrencalled name
0.000.00   10102/10102   CommitTransaction (1051)
[1]100.00.01   31.47   10102 PreCommit_on_commit_actions [1]
0.01   31.43   10100/10100   heap_truncate [2]
0.000.03 1005050/1005260 lappend_oid [325]
---
0.01   31.43   10100/10100   PreCommit_on_commit_actions [1]
[2] 99.90.01   31.43   10100 heap_truncate [2]
0.09   27.30 1005050/1005050 heap_truncate_one_rel [3]
0.203.57 1005050/6087120 table_open  [465]
0.010.22 1005050/6045137 table_close [48]
0.000.03 1005050/1017744 lappend [322]
0.010.00   10100/10100   heap_truncate_check_FKs [425]
---
0.09   27.30 1005050/1005050 heap_truncate [2]
[3] 87.00.09   27.30 1005050 heap_truncate_one_rel [3]
0.02   12.23 1005050/1005050 RelationTruncateIndexes [5]
0.06   10.08 1005050/1005050 ResetVacStats [7]
0.034.89 1005050/1005050
table_relation_nontransactional_truncate [12]

I think this is saying that more than half the time is being spent
just checking for indexes. There were no indexes on these temporary
tables. Does not having any indexes cause the relcache treat it as a
cache miss every time?

0.06   10.08 1005050/1005050 heap_truncate_one_rel [3]
[7] 32.20.06   10.08 1005050 ResetVacStats [7]
0.023.83 1005050/1005250 SearchSysCacheCopy [16]
0.203.57 1005050/6087120 table_open  [465]
0.012.02 1005050/1005050 heap_inplace_update [35]
0.010.22 1005050/6045137 table_close [48]
0.000.20 1005050/1005150
GetOldestNonRemovableTransactionId [143]
0.000.01 1005050/1005150 GetOurOldestMultiXactId [421]
0.000.00 1005050/1008750 ObjectIdGetDatum [816]

I guess this means GetOldestNonRemovableTransactionId is not the main
cost in ResetVacStats though I don't understand why the syscache would
be so slow.

I think there's a facility for calculating the Horizons and then
reusing them for a while but I don't see how to use that here. It
would be appropriate I think.

>
> > Honestly I'm glad I wrote the test because it was hard to know whether
> > my code was doing anything at all without it (and it wasn't in the
> > first cut...) But I don't think there's much value in having it be in
> > the regression suite. We don't generally write tests to ensure that a
> > specific internal implementation behaves in the specific way it was
> > written to.
>
> To me it seems important to test that your change actually does what it
> intends to. Possibly the test needs to be relaxed some, but I do think we want
> tests for the change.
>
> Greetings,
>
> Andres Freund

--
greg
F

Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Greg Stark

On Fri, 14 Apr 2023 at 13:15, Laurenz Albe  wrote:
>
> On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote:
> > On 2023-Apr-14, Greg Stark wrote:
> > > I assume people would use hot_standby_feedback if they have streaming
> > > replication.
> >
> > Yes, either that or a replication slot.
>
> A replication slot doesn't do anything against snapshot conflicts,
> which is what we are discussing here.  Or are we not?

They're related -- the replication slot holds the feedback xmin so
that if your standby disconnects it can reconnect later and not have
lost data in the meantime. At least I think that's what I think it
does -- I don't know if I'm just assuming that, but xmin is indeed in
pg_replication_slots.

-- 
greg

Re: [RFC] building postgres with meson -v8

2023-04-14 Thread Greg Stark

On Wed, 11 May 2022 at 06:19, Peter Eisentraut
 wrote:
>
> After that, these configure options don't have an equivalent yet:
>
> --enable-profiling

Afaics this still doesn't exist? Is there a common idiom to enable
this? Like, if I add in something to cflags is that enough? I seem to
recall we had some hack to actually get each backend's gmon.out to not
step on each other's which needed an explicit flag to enable?

-- 
greg

Re: Should we remove vacuum_defer_cleanup_age?

2023-04-14 Thread Greg Stark

On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz  wrote:
>
> Let me restate [1] in a different way.
>
> Using a large enough dataset, I did qualitatively look at overall usage
> of both "vacuum_defer_cleanup_age" and compared to
> "hot_standby_feedback", given you can use both to accomplish similar
> outcomes.

I assume people would use hot_standby_feedback if they have streaming
replication. The main use cases for vacuum_defer_cleanup_age would be
if you're replaying WAL files. That may sound archaic but there are
plenty of circumstances where your standby may not have network access
to your primary at all or not want to be replaying continuously.

I wonder whether your dataset is self-selecting sites that have
streaming replication. That does seem like the more common usage
pattern.

Systems using wal files are more likely to be things like data
warehouses, offline analytics systems, etc. They may not even be well
known in the same organization that runs the online operations -- in
my experience they're often run by marketing or sales organizations or
in some cases infosec teams and consume data from lots of sources. The
main reason to use wal archive replay is often to provide the
isolation so that the operations team don't need to worry about the
impact on production and that makes it easy to forget these even
exist.

-- 
greg

Re: Temporary tables versus wraparound... again

2023-04-14 Thread Greg Stark

On Thu, 13 Apr 2023 at 13:01, Peter Geoghegan  wrote:
>
> On Thu, Apr 13, 2023 at 9:45 AM Robert Haas  wrote:
> >
> > On Wed, Apr 12, 2023 at 4:23 PM Greg Stark  wrote:
> > > Am I crazy or is the parenthetical comment there exactly backwards? If
> > > the horizon is *more recent* then fewer tuples are *non*-removable.
> > > I.e. *more* tuples are removable, no?
> >
> > Isn't it the non-parenthetical part that's wrong? I would expect that
> > if we don't know which relation it is, the horizon might be
> > considerably LESS recent, which would result in fewer tuples being
> > removable.
>
> You can make arguments for either way of restating it being clearer
> than the other.

Yeah, I think Robert is being confused by the implicit double
negative. If we don't know which relation it is it's because relation
is NULL and the comment is talking about if it's "not NULL". I think
you're right that it would be less confusing if it just says "if you
pass NULL we have to give a conservative result which means an older
xid and fewer removable tuples".

But I'm saying the parenthetical part is not just confusing, it's
outright wrong. I guess that just means the first half was so
confusing it confused not only the reader but the author too.

-- 
greg

Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2023-04-12 Thread Greg Stark

On Wed, Apr 12, 2023, 19:30 Daniel Gustafsson  wrote:
>
>  The issue we have is that we cannot get PGconn in
> verify_cb so logging an error is tricky.


Hm, the man page talks about a "ex_data mechanism" which seems to be
referring to this Rube Goldberg device
https://www.openssl.org/docs/man3.1/man3/SSL_get_ex_data.html

It looks like X509_STORE_CTX_set_app_data() and
X509_STORE_CTX_get_app_data() would be convenience macros to do this.

Re: Temporary tables versus wraparound... again

2023-04-12 Thread Greg Stark

On Wed, 5 Apr 2023 at 13:42, Andres Freund  wrote:
>
> ISTM that the problem here really is that you're relying on RecentXmin, rather
> than computing something more accurate. Why not use
> GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I
> don't think it'll matter compared to the cost of truncating the relation.

I'm trying to wrap my head around GetOldestNonRemovableTransactionId()
and whether it's the right thing here. This comment is not helping me:

/*
 * Return the oldest XID for which deleted tuples must be preserved in the
 * passed table.
 *
 * If rel is not NULL the horizon may be considerably more recent than
 * otherwise (i.e. fewer tuples will be removable). In the NULL case a horizon
 * that is correct (but not optimal) for all relations will be returned.
 *
 * This is used by VACUUM to decide which deleted tuples must be preserved in
 * the passed in table.
 */

Am I crazy or is the parenthetical comment there exactly backwards? If
the horizon is *more recent* then fewer tuples are *non*-removable.
I.e. *more* tuples are removable, no?

-- 
greg

Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-04-12 Thread Greg Stark

On Wed, 12 Apr 2023 at 11:02, Robert Haas  wrote:
>
> I mostly just wanted to say that I disagreed with Tom about the
> particular point on postmaster.pid, without really expressing an
> opinion about anything else.

I don't object to using the pid file as the mechanism -- but it is a
bit of an awkward UI for shell scripting. I imagine it would be handy
if pg_ctl had an option to just print the port number so you could get
it with a simple port=`pg_ctl -D  status-port`

-- 
greg

Re: pg_init_privs corruption.

2023-04-11 Thread Greg Stark

On Fri, 17 Feb 2023 at 15:38, Tom Lane  wrote:
>
> Hmm, so Stephen was opining that the extension's objects shouldn't
> have gotten these privs attached in the first place.  I'm not
> quite convinced about that one way or the other, but if you buy it
> then maybe this situation is unreachable once we fix that.

Well pg_dump might still have to deal with it even if it's unreachable
in new databases (or rather schemas with extensions newly added... it
might be hard to explain what cases are affected).

Alternately it'll be a note at the top of every point release pointing
at a note explaining how to run a script to fix your schemas.

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-04-09 Thread Greg Stark

So here's the list of CF entries that I thought *might* still get
committed either because they're an Open Issue or they're one of those
other categories. I had intended to go through and add them all to the
Open Issues but it turns out I only feel confident in a couple of them
qualifying for that:

Already added:
* Default to ICU during initdb
* Assertion failure with parallel full hash join
* RecoveryConflictInterrupt() is unsafe in a signal handler
* pg_visibility's pg_check_visible() yields false positive when
working in parallel with autovacuum

Not added:
* Fix bogus error emitted by pg_recvlogical when interrupted
* clean up permission checks after 599b33b94
* Fix assertion failure with next_phase_at in snapbuild.c
* Fix assertion failure in SnapBuildInitialSnapshot()
* Fix improper qual pushdown after applying outer join identity 3
* Add document is_superuser
* Improve doc for autovacuum on partitioned tables
* Create visible links for HTML elements that have an id to make them
discoverable via the web interface
* Fix inconsistency in reporting checkpointer stats
* pg_rewind WAL deletion pitfall
* Update relfrozenxmin when truncating temp tables
* Testing autovacuum wraparound
* Add TAP tests for psql \g piped into program

I'll move these CF entries to the next CF now. I think they all are
arguably open issues though of varying severity.

-- 
greg

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-04-08 Thread Greg Stark

On Wed, 22 Feb 2023 at 13:38, Kirk Wolak  wrote:
>
> I already requested ONLY the HH24 format.  8 characters of output.  no 
> options.  It's a waste of time.
> After all these years, sqlplus still has only one setting (show it, or not).  
> I am asking the same here.
> And I will gladly defend not changing it!  Ever!

Yeah, well, it's kind of beside the point that you're satisfied with
this one format. We tend to think about what all users would expect
and what a complete feature would look like.

I actually tend to think this would be a nice feature. It's telling
that log files and other tracing tools tend to produce exactly this
type of output with every line prefixed with either a relative or
absolute timestamp.

I'm not sure if the *prompt* is a sensible place for it though. The
place it seems like it would be most useful is reading the output of
script executions where there would be no prompts. Perhaps it's the
command tags and \echo statements that should be timestamped.

And I think experience shows that there are three reasonable formats
for dates, the default LC_TIME format, ISO8601, and a relative
"seconds (with milliseconds) since starting". I think having a feature
that doesn't support those three would feel incomplete and eventually
need to be finished.

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-04-08 Thread Greg Stark

So here we are at the end of the CF:

 Status summary: March 15March 22March 28April 4 April 8
   Needs review: 152 128 116  96  74
   Waiting on Author: 42  36  30  21  14
   Ready for Committer:   39  32  32  35  27
   Committed: 61  82  94 101 124
   Moved to next CF:   4  15  17  28  39
   Withdrawn: 17  16  18  20  10
   Rejected:   0   5   6   8   9
   Returned with Feedback: 4   5   6  10  22
 Total: 319.

I'm now going to go through and:

a) Mark Waiting on Author any patches that aren't building Waiting on
Author. There was some pushback about asking authors to do trivial
rebases but in three months it won't make sense to start a CF with
already-non-applying patches.

b) Mark any patches that received solid feedback in the last week with
either Returned with Feedback or Rejected. I think this was already
done though with 12 RwF and 1 Rejected in the past four days alone.

c) Pick out the Bug Fixes, cleanup patches, and other non-feature
patches that might be open issues for v16.

d) Move to Next CF any patches that remain.

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-04-07 Thread Greg Stark

As announced on this list feature freeze is at 00:00 April 8 AoE.
That's less than 24 hours away. If you need to set your watches to AoE
timezone it's currently:

$ TZ=AOE+12 date
Fri 07 Apr 2023 02:05:50 AM AOE

As we stand we have:

Status summary:
  Needs review: 82
  Waiting on Author:16
  Ready for Committer:  27
  Committed:   115
  Moved to next CF: 38
  Returned with Feedback:   10
  Rejected:  9
  Withdrawn:22
Total: 319.

In less than 24h most of the remaining patches will get rolled forward
to the next CF. The 16 that are Waiting on Author might be RwF
perhaps. The only exceptions would be non-features like Bug Fixes and
cleanup patches that have been intentionally held until the end --
those become Open Issues for the release.

So if we move forward all the remaining patches (so these numbers are
high by about half a dozen) the *next* CF would look like:

Commitfest 2023-07:Now  April 8
  Needs review: 46. 128
  Waiting on Author:17.  33
  Ready for Committer:   3.  30
Total:  66  191

I suppose that's better than the 319 we came into this CF with but
there's 3 months to accumulate more unreviewed patches...

I had hoped to find lots of patches that I could bring the hammer down
on and say there's just no interest in or there's no author still
maintaining. But that wasn't the case. Nearly all the patches still
had actively interested authors and looked like they were legitimately
interesting and worthwhile features that people just haven't had the
time to review or commit.


--
greg

Re: Temporary tables versus wraparound... again

2023-04-06 Thread Greg Stark

On Wed, 5 Apr 2023 at 13:42, Andres Freund  wrote:
>
> Not if you determine a relation specific xmin, and the relation is not a
> shared relation.
>
> ISTM that the problem here really is that you're relying on RecentXmin, rather
> than computing something more accurate. Why not use
> GetOldestNonRemovableTransactionId(rel) - It's a bit more expensive, but I
> don't think it'll matter compared to the cost of truncating the relation.

Thanks for the review!

Hm, I was just copying heapam_handler.c:593 so it would be consistent
with what we do when we create a new table. I wasn't aware we had
anything that did this extra work I'll look at it.

But I'm not sure it's the best idea to decide on how
truncate/vacuum/create table work based on what happens to be easier
to test. I mean I'm all for testable code but tieing vacuum behaviour
to what our test framework happens to not interfere with might be a
bit fragile. Like, if we happen to want to change the testing
framework I think this demonstrates that it will be super easy for it
to break the tests again. And if we discover we have to change the
relfrozenxid behaviour it might be hard to keep this test working.

> Somehow it doesn't feel right to use vac_update_relstats() in
> heapam_handler.c.
>
> I also don't like that your patch references
> heapam_relation_nontransactional_truncate in AddNewRelationTuple() - we
> shouldn't add more comments piercing tableam than necessary.

I'll take another look at this tomorrow. Probably I can extract the
common part of that function or I've misunderstood which bits of code
are above or below the tableam.

I think fundamentally the hardest bit was that the initial
relfrozenxid bubbles up from heapam_handler.c via a return value from
set_new_filelocator. So unless I want to add a new tableam method just
for relfrozenxid it's a bit awkward to get the right data to
AddNewRelationTuple and vac_update_relstats without duplicating code
and crosslinking in comments.

> To me it seems important to test that your change actually does what it
> intends to. Possibly the test needs to be relaxed some, but I do think we want
> tests for the change.

I missed the comment about relaxing the tests until just now. I'll
think about if there's an easy way out in that direction too.

If it's cutting it too fine to the end of the commitfest we could
always just commit the warnings from the 001 patch which would already
be a *huge* help for admins running into this issue.

Chag Sameach!

--
greg

Re: Temporary tables versus wraparound... again

2023-04-05 Thread Greg Stark

On Wed, 5 Apr 2023 at 11:15, Andres Freund  wrote:
>
> The freebsd test that failed is running tests in parallel, against an existing
> cluster. In that case it's expected that there's some concurrency.
>
> Why does this cause your tests to fail? They're in separate databases, so the
> visibility effects of the concurrent tests should be somewhat limited.

Because I'm checking that relfrozenxid was updated but any concurrent
transactions even in other databases hold back the xmin.

Honestly I'm glad I wrote the test because it was hard to know whether
my code was doing anything at all without it (and it wasn't in the
first cut...) But I don't think there's much value in having it be in
the regression suite. We don't generally write tests to ensure that a
specific internal implementation behaves in the specific way it was
written to.

-- 
greg

Re: Schema variables - new implementation for Postgres 15

2023-04-05 Thread Greg Stark

On Sun, 26 Mar 2023 at 07:34, Julien Rouhaud  wrote:
>
> This feature can significantly increase log size, so it's disabled by default.
> For testing or development environments it's recommended to enable it if you
> use session variables.

I think it's generally not practical to have warnings for valid DML.
Effectively warnings in DML are errors since they make the syntax just
unusable. I suppose it's feasible to have it as a debugging option
that defaults to off but I'm not sure it's really useful.

I suppose it raises the question of whether session variables should
be in pg_class and be in the same namespace as tables so that
collisions are impossible. I haven't looked at the code to see if
that's feasible or reasonable. But this feels a bit like what happened
with sequences where they used to be a wholly special thing and later
we realized everything was simpler if they were just a kind of
relation.

-- 
greg

Re: Why enable_hashjoin Completely disables HashJoin

2023-04-05 Thread Greg Stark

On Mon, 3 Apr 2023 at 19:32, Tom Lane  wrote:
>
> Or we could rethink the design goal of not allowing enable_foo switches
> to cause us to fail to make a plan.  That might be unusable though.

Off the top of my head I don't see why. It's not like the possible
plans are going to change on you often, only when DDL changes the
schema.

The only one that gives me pause is enable_seqscan. I've seen multiple
sites that turn it off as a hammer to force OLTP-style plans. They
still get sequential scans where they're absolutely necessary such as
small reference tables with no usable index and rely on that
behaviour.

In that case we would ideally generate a realistic cost estimate for
the unavoidable sequential scan to avoid twisting the rest of the plan
in strange ways.

But perhaps these sites would be better served with different
machinery anyways. If they actually did get a sequential scan on a
large table or any query where the estimate was very high where they
were expecting low latency OLTP queries perhaps they would prefer to
get an error than some weird plan anyways.

And for query planning debugging purposes of course it would be more
powerful to be able to enable/disable plan types per-node. That would
avoid the problem of not being able to effectively test a plan without
a sequential scan on one table when another table still needs it. But
that direction...

-- 
greg

Re: Temporary tables versus wraparound... again

2023-04-05 Thread Greg Stark

On Wed, 5 Apr 2023 at 01:41, Greg Stark  wrote:
>
> On Wed, 29 Mar 2023 at 17:48, Justin Pryzby  wrote:
> >
> > The patch still occasionally fails its tests under freebsd.
> > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3358
>
> I wonder if some other test is behaving differently on FreeBSD and
> leaving behind a prepared transaction or a zombie session in some idle
> state or something like that? Is there anything (aside from
> autovacuum) connecting or running in the background in the test
> environment that could be creating a transaction id and holding back
> snapshot xmin?

Ok, I've reproduced this here by running the tests under meson. It
doesn't look like it's platform dependent.

It seems under meson the different test suites are run in parallel or
at least isolation/deadlock-parallel are still running stuff when the
regression checks are running.  If that's not expected then maybe
something's not behaving as expected? I've attached pg_stat_activity
from during the test run.

Regardless it shows these tests are obviously not robust enough to
include as they would break for anyone running make installcheck on a
non-idle cluster.

That's fine, as I said, the tests were just there to give a reviewer
more confidence and I think it's fine to just not include them in the
commit.

-- 
greg
+select * from pg_stat_activity;
+ datid  |   datname|   pid   | leader_pid | usesysid | usename |   
 application_name| client_addr | client_hostname | 
client_port |backend_start| xact_start  
| query_start |state_change 
| wait_event_type | wait_event  |state| backend_xid 
| backend_xmin | query_id |  query  
| backend_type 
++--+-++--+-++-+-+-+-+-+-+-+-+-+-+-+--+--+-+--
+|  | 3970033 ||  | |   
 | | |  
   | Wed Apr 05 06:48:49.553102 2023 PDT |  
   | |  
   | Activity| AutoVacuumMain  | | 
|  |  | 
| autovacuum launcher
+|  | 3970034 ||   10 | stark   |   
 | | |  
   | Wed Apr 05 06:48:49.553252 2023 PDT |  
   | |  
   | Activity| LogicalLauncherMain | | 
|  |  | 
| logical replication launcher
+ 310614 | regression   | 4011208 ||   10 | stark   | 
pg_regress/temp| | 
|  -1 | Wed Apr 05 07:03:44.593994 2023 PDT | Wed Apr 05 07:03:44.60988 
2023 PDT  | Wed Apr 05 07:03:44.60988 2023 PDT  | Wed Apr 05 07:03:44.60988 
2023 PDT  | | | active  |   
  |   209833 |  | select * from pg_stat_activity;   
  | client backend
+ 310517 | isolation_regression | 4008269 ||   10 | stark   | 
isolation/deadlock-parallel/d1 | | 
|  -1 | Wed Apr 05 07:03:36.814753 2023 PDT |   
  | Wed Apr 05 07:03:36.991825 2023 PDT | Wed Apr 05 07:03:36.991914 
2023 PDT | Client  | ClientRead  | idle|
 |  |  | COMMIT;
 | client backend
+ 310517 | isolation_regression | 4008287 ||   10 | stark   | 
isolation/deadlock-parallel/e2 | | 
|  -1 | Wed Apr 05 07:03:36.863847 2023 PDT | Wed Apr 05 
07:03:36.896346 2023 PDT | Wed Apr 05

Re: Temporary tables versus wraparound... again

2023-04-04 Thread Greg Stark

On Wed, 29 Mar 2023 at 17:48, Justin Pryzby  wrote:
>
> The patch still occasionally fails its tests under freebsd.
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/42/3358

So on the one hand, I don't think the plan is to actually commit the
tests. They're very specific to one bit of internal implementation and
they're the kind of test that makes maintaining the test suite a pain
and patches to cause false positives. They're only in the patch I
posted at all to demonstrate that the code was actually running at all
and having the desired effect.

That said I would be a lot more sanguine about this failure if I had
any theory for *why* it would fail. And on FreeBSD specifically which
is even stranger.

Afaict the relfrozenxid should always be our own transaction when the
table is created and then again our own new transaction when the table
is truncated. And neither the INSERT nor the ANALYZE should be
touching relfrozenxid, nor should it be possible autovacuum is
interfering given it's a temp table (and we're attached to the
schema). And none of this should be platform dependent.

I wonder if some other test is behaving differently on FreeBSD and
leaving behind a prepared transaction or a zombie session in some idle
state or something like that? Is there anything (aside from
autovacuum) connecting or running in the background in the test
environment that could be creating a transaction id and holding back
snapshot xmin?

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-04-04 Thread Greg Stark

On Tue, 4 Apr 2023 at 11:18, Tom Lane  wrote:
>
> > * clean up permission checks after 599b33b94
>
> I believe that the actual bug fixes are in, and what's left is just a test
> case that people weren't very excited about adding.  So maybe this should
> get closed out as committed.

I'm not super convinced about this one. I'm not a big "all tests are
good tests" believer but this test seems like a pretty reasonable one.
Permissions checks and user mappings are user-visible behaviour that
are easy to overlook when making changes with unexpected side effects.

It seems like the test would be just as easy to commit as to not
commit and I don't see anything tricky about it that would necessitate
a more in depth review.

-- 
greg

Re: monitoring usage count distribution

2023-04-04 Thread Greg Stark

On Mon, 30 Jan 2023 at 18:31, Nathan Bossart  wrote:
>
> My colleague Jeremy Schneider (CC'd) was recently looking into usage count
> distributions for various workloads, and he mentioned that it would be nice
> to have an easy way to do $SUBJECT.  I've attached a patch that adds a
> pg_buffercache_usage_counts() function.  This function returns a row per
> possible usage count with some basic information about the corresponding
> buffers.
>
> postgres=# SELECT * FROM pg_buffercache_usage_counts();
>  usage_count | buffers | dirty | pinned
> -+-+---+
>0 |   0 | 0 |  0
>1 |1436 |   671 |  0
>2 | 102 |88 |  0
>3 |  23 |21 |  0
>4 |   9 | 7 |  0
>5 | 164 |   106 |  0
> (6 rows)
>
> This new function provides essentially the same information as
> pg_buffercache_summary(), but pg_buffercache_summary() only shows the
> average usage count for the buffers in use.  If there is interest in this
> idea, another approach to consider could be to alter
> pg_buffercache_summary() instead.

Tom expressed skepticism that there's wide interest here. It seems as
much from the lack of response. But perhaps that's just because people
don't understand what the importance of this info is -- I certainly
don't :)

I feel like the original sin here is having the function return an
aggregate data. If it returned the raw data then people could slice,
dice, and aggregate the data in any ways they want using SQL. And
perhaps people would come up with queries that have more readily
interpretable important information?

Obviously there are performance questions in that but I suspect they
might be solvable given how small the data for each buffer are.

Just as a warning though -- if nobody was interested in this patch
please don't take my comments as a recommendation that you spend a lot
of time developing a more complex version in the same direction
without seeing if anyone agrees with my suggestion :)

-- 
greg

Re: Add "host" to startup packet

2023-04-02 Thread Greg Stark

On Sun, 2 Apr 2023 at 11:38, Tom Lane  wrote:
>
> Even if all that infrastructure sprang into existence, is this really any
> more useful than basing your switching on the host's resolved IP address?
> I'm doubtful that there's enough win there to justify pushing this rock
> to the top of the mountain.

Hm. I think it's going to turn out to be useful. Experience shows
depending on the ip address often paints people into corners. However
I agree that we need to actually have a real use case in hand where
someone is going to actually do something with it.

My question is a bit different. How does this interact with TLS SNI.
Can you just use the SNI name given in the TLS handshake? Should the
server require them to match? Is there any value to having a separate
source for this info? Is something similar available in GSSAPI
authentication?

-- 
greg

Re: Experiments with Postgres and SSL

2023-03-31 Thread Greg Stark

And the cfbot wants a small tweak
From 3d0a502c25504da32b7a362831c700b4e891f79b Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Fri, 31 Mar 2023 02:31:31 -0400
Subject: [PATCH v6 5/6] Allow pipelining data after ssl request

---
 src/backend/postmaster/postmaster.c | 54 ++---
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9b4b37b997..33b317f98b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2001,13 +2001,14 @@ ProcessSSLStartup(Port *port)
 
 		if (port->raw_buf_remaining > 0)
 		{
-			/* This shouldn't be possible -- it would mean the client sent
-			 * encrypted data before we established a session key...
-			 */
-			elog(LOG, "Buffered unencrypted data remains after negotiating native SSL connection");
+			ereport(COMMERROR,
+	(errcode(ERRCODE_PROTOCOL_VIOLATION),
+	 errmsg("received unencrypted data after SSL request"),
+	 errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middle attack.")));
 			return STATUS_ERROR;
 		}
-		pfree(port->raw_buf);
+		if (port->raw_buf)
+			pfree(port->raw_buf);
 
 		if (port->ssl_alpn_protocol == NULL)
 		{
@@ -2158,15 +2159,44 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
 		}
 
 #ifdef USE_SSL
-		if (SSLok == 'S' && secure_open_server(port) == -1)
-			return STATUS_ERROR;
+		if (SSLok == 'S')
+		{
+			/* push unencrypted buffered data back through SSL setup */
+			len = pq_buffer_has_data();
+			if (len > 0)
+			{
+buf = palloc(len);
+if (pq_getbytes(buf, len) == EOF)
+	return STATUS_ERROR; /* shouldn't be possible */
+port->raw_buf = buf;
+port->raw_buf_remaining = len;
+port->raw_buf_consumed = 0;
+			}
+			Assert(pq_buffer_has_data() == 0);
+
+			if (secure_open_server(port) == -1)
+return STATUS_ERROR;
+			
+			/*
+			 * At this point we should have no data already buffered.  If we do,
+			 * it was received before we performed the SSL handshake, so it wasn't
+			 * encrypted and indeed may have been injected by a man-in-the-middle.
+			 * We report this case to the client.
+			 */
+			if (port->raw_buf_remaining > 0)
+ereport(FATAL,
+		(errcode(ERRCODE_PROTOCOL_VIOLATION),
+		 errmsg("received unencrypted data after SSL request"),
+		 errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middle attack.")));
+			if (port->raw_buf)
+pfree(port->raw_buf);
+		}
 #endif
 
-		/*
-		 * At this point we should have no data already buffered.  If we do,
-		 * it was received before we performed the SSL handshake, so it wasn't
-		 * encrypted and indeed may have been injected by a man-in-the-middle.
-		 * We report this case to the client.
+		/* This can only really occur now if there was data pipelined after
+		 * the SSL Request but we have refused to do SSL. In that case we need
+		 * to give up because the client has presumably assumed the SSL
+		 * request would be accepted.
 		 */
 		if (pq_buffer_has_data())
 			ereport(FATAL,
-- 
2.40.0

From 5413f1a1ee897640bd3bb99bae226eec7e2e9f50 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v6 4/6] Direct SSL connections ALPN support

---
 src/backend/libpq/be-secure-openssl.c | 66 +++
 src/backend/libpq/be-secure.c |  3 +
 src/backend/postmaster/postmaster.c   | 25 +++
 src/backend/utils/misc/guc_tables.c   |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/bin/psql/command.c|  7 +-
 src/include/libpq/libpq-be.h  |  1 +
 src/include/libpq/libpq.h |  1 +
 src/include/libpq/pqcomm.h| 19 ++
 src/interfaces/libpq/fe-connect.c |  5 ++
 src/interfaces/libpq/fe-secure-openssl.c  | 31 +
 src/interfaces/libpq/libpq-int.h  |  1 +
 12 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..620ffafb0b 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *con

Re: Experiments with Postgres and SSL

2023-03-31 Thread Greg Stark

On Mon, 20 Mar 2023 at 16:31, Greg Stark  wrote:
>
> Here's a first cut at ALPN support.
>
> Currently it's using a hard coded "Postgres/3.0" protocol

Apparently that is explicitly disrecommended by the IETF folk. They
want something like "TBD" so people don't start using a string until
it's been added to the registry. So I've changed this for now (to
"TBD-pgsql")

Ok, I think this has pretty much everything I was hoping to do.

The one thing I'm not sure of is it seems some codepaths in postmaster
have ereport(COMMERROR) followed by returning an error whereas other
codepaths just have ereport(FATAL). And I don't actually see much
logic in which do which. (I get the principle behind COMMERR it just
seems like it doesn't really match the code).

I realized I had exactly the infrastructure needed to allow pipelining
the SSL ClientHello like Neon wanted to do so I added that too. It's
kind of redundant with direct SSL connections but seems like there may
be reasons to use that instead.



-- 
greg
From 083df15eff52f025064e2879a404270e405f7dde Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 16 Mar 2023 11:55:16 -0400
Subject: [PATCH v5 2/6] Direct SSL connections client support

---
 src/interfaces/libpq/fe-connect.c | 91 +--
 src/interfaces/libpq/libpq-fe.h   |  1 +
 src/interfaces/libpq/libpq-int.h  |  3 +
 3 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index bb7347cb0c..7cd0eb261f 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -274,6 +274,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"SSL-Mode", "", 12,		/* sizeof("verify-full") == 12 */
 	offsetof(struct pg_conn, sslmode)},
 
+	{"sslnegotiation", "PGSSLNEGOTIATION", "postgres", NULL,
+		"SSL-Negotiation", "", 14,		/* strlen("requiredirect") == 14 */
+	offsetof(struct pg_conn, sslnegotiation)},
+
 	{"sslcompression", "PGSSLCOMPRESSION", "0", NULL,
 		"SSL-Compression", "", 1,
 	offsetof(struct pg_conn, sslcompression)},
@@ -1504,11 +1508,36 @@ connectOptions2(PGconn *conn)
 		}
 #endif
 	}
+
+	/*
+	 * validate sslnegotiation option, default is "postgres" for the postgres
+	 * style negotiated connection with an extra round trip but more options.
+	 */
+	if (conn->sslnegotiation)
+	{
+		if (strcmp(conn->sslnegotiation, "postgres") != 0
+			&& strcmp(conn->sslnegotiation, "direct") != 0
+			&& strcmp(conn->sslnegotiation, "requiredirect") != 0)
+		{
+			conn->status = CONNECTION_BAD;
+			libpq_append_conn_error(conn, "invalid %s value: \"%s\"",
+	"sslnegotiation", conn->sslnegotiation);
+			return false;
+		}
+
+#ifndef USE_SSL
+		if (conn->sslnegotiation[0] != 'p') {
+			conn->status = CONNECTION_BAD;
+			libpq_append_conn_error(conn, "sslnegotiation value \"%s\" invalid when SSL support is not compiled in",
+	conn->sslnegotiation);
+			return false;
+		}
+#endif
+	}
 	else
 	{
-		conn->sslmode = strdup(DefaultSSLMode);
-		if (!conn->sslmode)
-			goto oom_error;
+		libpq_append_conn_error(conn, "sslnegotiation missing?");
+		return false;
 	}
 
 	/*
@@ -1614,6 +1643,18 @@ connectOptions2(PGconn *conn)
 	conn->gssencmode);
 			return false;
 		}
+#endif
+#ifdef USE_SSL
+		/* GSS is incompatible with direct SSL connections so it requires the
+		 * default postgres style connection ssl negotiation  */
+		if (strcmp(conn->gssencmode, "require") == 0 &&
+			strcmp(conn->sslnegotiation, "postgres") != 0)
+		{
+			conn->status = CONNECTION_BAD;
+			libpq_append_conn_error(conn, "gssencmode value \"%s\" invalid when Direct SSL negotiation is enabled",
+	conn->gssencmode);
+			return false;
+		}
 #endif
 	}
 	else
@@ -2747,11 +2788,12 @@ keep_going:		/* We will come back to here until there is
 		/* initialize these values based on SSL mode */
 		conn->allow_ssl_try = (conn->sslmode[0] != 'd');	/* "disable" */
 		conn->wait_ssl_try = (conn->sslmode[0] == 'a'); /* "allow" */
+		/* direct ssl is incompatible with "allow" or "disabled" ssl */
+		conn->allow_direct_ssl_try = conn->allow_ssl_try && !conn->wait_ssl_try && (conn->sslnegotiation[0] != 'p');
 #endif
 #ifdef ENABLE_GSS
 		conn->try_gss = (conn->gssencmode[0] != 'd');	/* "disable" */
 #endif
-
 		reset_connection_state_machine = false;
 		need_new_connection = true;
 	}
@@ -3209,6 +3251,28 @@ keep_going:		/* We will come back to here until there is
 if (pqsecure_initialize(conn, false, true

Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-03-23 Thread Greg Stark

On Thu, 23 Mar 2023 at 23:30, Bharath Rupireddy
 wrote:
>
> > + ereport(log_replication_commands ? LOG : DEBUG3,
> > + (errmsg("acquired physical replication slot \"%s\"",
> > + slotname)));

So this is just a bit of bike-shedding but I don't feel like these log
messages really meet the standard we set for our logging. Like what
did the acquiring? What does "acquired" actually mean for a
replication slot? Is there not any meta information about the
acquisition that can give more context to the reader to make this
message more meaningful?

I would expect a log message like this to say, I dunno, something like
"physical replication slot \"%s\" acquired by streaming TCP connection
to 192.168.0.1:999 at LSN ... with xxxMB of logs to read"

I even would be wondering if the other end shouldn't also be logging a
corresponding log and we shouldn't be going out of our way to ensure
there's enough information to match them up and presenting them in a
way that makes that easy.

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-03-23 Thread Greg Stark

So of the patches with no emails since August-December 2022:

* New hooks in the connection path

* Add log messages when replication slots become active and inactive
  - Peter Smith and Alvaro Herrera have picked up this one

* Remove dead macro exec_subplan_get_plan
  - Minor cleanup

* Consider parallel for LATERAL subqueries having LIMIT/OFFSET
 - No response since Sept 2022. There was quite a lot of discussion
and Tom Lane and Robert Haas expressed some safety concerns which were
responded to but I guess people have put in as much time as they can
afford on this. I'll mark this Returned with Feedback.

* pg_rewind WAL deletion pitfall
 - I think this is a bug fix to pg_rewind that maybe should be Ready
for Committer?

* Simplify find_my_exec by using realpath(3)
 - Tom Lane is author but I don't know if he intends to apply this in
this release

* Move backup-related code to xlogbackup.c/.h
  - It looks like neither Alvaro Herrera nor Michael Paquier are
particularly convinced this is an improvement and nobody has put more
time in this since last October. I'm inclined to mark this Rejected?

* Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
showing metadata ...)
  - According to the internet "As part of their 39 month old
development and milestones, your patch should be able to see like an
adult (20/20 vision), be able to run, walk, hop and balance himself
while standing with one foot quite confidently." Can it do all those
things yet?

  - Should this be broken up into smaller CF entries so at least some
of them can be Ready for Committer and closed?

> * Fix bogus error emitted by pg_recvlogical when interrupted
  - Is this a minor cleanup?

> * Check consistency of GUC defaults between .sample.conf and 
> pg_settings.boot_val
  - It looks like this was pretty active until last October and might
have been ready to apply at least partially? But no further work or
review has happened since.

> * Code checks for App Devs, using new options for transaction behavior
  - This is an interesting set of features from Simon Riggs to handle
"dry-run" style SQL execution by changing the semantics of BEGIN and
END and COMMIT. It has feedback from Erik Rijkers and Dilip Kumar but
I don't think it's gotten any serious design discussion. I posted a
quick review myself just now but still the point remains.

I think features supporting a "dry-run" mode would be great but
judging by the lack of response this doesn't look like the version of
that people are interested in.

I'm inclined to mark this Rejected, even if that's only by default. If
someone is interested in running with this in the future maybe
Returned with Feedback would be better even there really wasn't much
feedback. In practice it amounts to the same thing I think.

> * Lockless queue of waiters based on atomic operations for LWLock
 - Is this going in this CF? It looks like something we don't want to
lose though

> * Fix assertion failure with next_phase_at in snapbuild.c
  - It's a bug fix but it doesn't look like the bug has been entirely fixed?

> * Add sortsupport for range types and btree_gist
  - It doesn't l look like anyone has interested in reviewing this
patch. It's been bouncing forward from CF to CF since last August. I'm
not sure what to do. Maybe we just have to say it's rejected for lack
of reviewers interested/competent to review this area of the code.

> * asynchronous execution support for Custom Scan
  - This is a pretty small isolated feature.

> * CREATE INDEX CONCURRENTLY on partitioned table
  - I'm guessing this patch is too big and too late to go in this CF.
And it sounds like there's still work to be done? Should this be
marked RwF?

> * Partial aggregates push down

  - I'm not sure what the state of this is, it's had about a year and
a half of work and seems to have had real work going into it during
all that time. It's just a big change. Is it ready for commit or are
there still open questions? Is it for this release or next release?

> * Non-replayable WAL records through overflows and >MaxAllocSize lengths

  - Andres says it's a bug fix

-- 
greg

Re: Code checks for App Devs, using new options for transaction behavior

2023-03-23 Thread Greg Stark

On Thu, 27 Oct 2022 at 07:10, Simon Riggs  wrote:
>
> In the past, developers have wondered how we can provide "--dry-run"
> functionality

That would be an awesome functionality, indeed. I have concerns of how
feasible it is in general but I think providing features to allow
developers to build it for their use cases is a good approach. The
corner cases that might not be possible in general might be tractable
developers willing to constrain their development environment or use
information available outside Postgres.

But... I have concerns about some of the design here.

> * psql --parse-only
> Checks the syntax of all SQL in a script, but without actually
> executing it. This is very important in the early stages of complex
> migrations because we need to see if the code would generate syntax
> errors before we attempt to execute it. When there are many
> dependencies between objects, actual execution fails very quickly if
> we run in a single transaction, yet running outside of a transaction
> can leave a difficult cleanup task. Fixing errors iteratively is
> difficult when there are long chains of dependencies between objects,
> since there is no easy way to predict how long it will take to make
> everything work unless you understand how many syntax errors exist in
> the script.
> 001_psql_parse_only.v1.patch

This effectively enables \gdesc mode for every query. It needs docs
explaining what's actually going to happen and how to use it because
that wasn't super obvious to me even after reading the patch. I'm not
sure reusing DescribeQuery() and then returning early is the best
idea.

But more importantly it's only going to handle the simplest scripts
that don't do DDL that further statements will depend on. That at
least needs to be documented.

> * nested transactions = off (default) | all | on
> Handle nested BEGIN/COMMIT, which can cause chaos on failure. This is
> an important part of guaranteeing that everything that gets executed
> is part of a single atomic transaction, which can then be rolled back
> - this is a pre-requisite for the last feature.
> 002_nested_xacts.v7.patch
> The default behavior is unchanged (off)
> Setting "all" treats nested BEGIN/COMMIT as subtransactions, allowing
> some parts to fail without rolling back the outer transaction.
> Setting "outer" flattens nested BEGIN/COMMIT into one single outer
> transaction, so that any failure rolls back the entire transaction.

I think we've been burned pretty badly by GUCs that control SQL
semantics before. I think there was discussion at the time nested
transactions went in and there must have been a reason we did
SAVEPOINT rather than make nested BEGINs do things like this. But
regardless if we do want to change what nested BEGINs do I think we
have to decide what behaviour we want, think about the backwards
compatibility impacts, and make the change. We can't make it just for
some people some of the time based on a GUC. Doing that makes it
impossible to write scripts that work consistently.

I'm not clear what happens if you have this feature enabled and *also*
use SAVEPOINTs...

You say this is a prerequisite for 003 and I see how they're related
though I don't immediately see why it should be necessary to change
nested BEGIN behaviour to make that work.

> * rollback_on_commit = off (default) | on
> Force transactions to fail their final commit, ensuring that no
> lasting change is made when a script is tested. i.e. accept COMMIT,
> but do rollback instead.
> 003_rollback_on_commit.v1.patch

I suppose technically this is also a "semantics controlled by a GUC"
but I guess it's safe since you would only set this when you want this
debugging environment and then you really do want it to be global.

I'm not sure it's super safe though. Like, dblink connections can
break it, what happens if it gets turned off midway through a
transaction?

I wonder if this should be handled by the client itself the way
autocommit is. Like have an "autocommit" mode of "autorollback"
instead. That would mean having to add it to every client library of
course but then perhaps it would be able to change client behaviour in
ways that make sense at the same time.

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-03-21 Thread Greg Stark

So a week later

 Status summary: March 15March 22
   Needs review: 152 128
   Waiting on Author: 42  36
   Ready for Committer:   39  32
   Committed: 61  82
   Moved to next CF:   4  15
   Withdrawn: 17  16 (?)
   Rejected:   0   5
   Returned with Feedback: 4   5
 Total: 319.


These patches that are "Needs Review" and have received no comments at
all since before March 1st are now below. There are about 20 fewer
such patches than there were last week.

No emails since August-December 2022:

* New hooks in the connection path
* Add log messages when replication slots become active and inactive
* Remove dead macro exec_subplan_get_plan
* Consider parallel for LATERAL subqueries having LIMIT/OFFSET
* pg_rewind WAL deletion pitfall
* Simplify find_my_exec by using realpath(3)
* Move backup-related code to xlogbackup.c/.h
* Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
showing metadata ...)
* Fix bogus error emitted by pg_recvlogical when interrupted
* Check consistency of GUC defaults between .sample.conf and
pg_settings.boot_val
* Code checks for App Devs, using new options for transaction behavior
* Lockless queue of waiters based on atomic operations for LWLock
* Fix assertion failure with next_phase_at in snapbuild.c
* Add sortsupport for range types and btree_gist
* asynchronous execution support for Custom Scan
* CREATE INDEX CONCURRENTLY on partitioned table
* Partial aggregates push down
* Non-replayable WAL records through overflows and >MaxAllocSize lengths

No emails since January 2023

* Enable jitlink as an alternative jit linker of legacy Rtdyld and add
riscv jitting support
* basebackup: support zstd long distance matching
* pgbench - adding pl/pgsql versions of tests
* Function to log backtrace of postgres processes
* More scalable multixacts buffers and locking
* COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns
* postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
* Add semi-join pushdown to postgres_fdw
* Skip replicating the tables specified in except table option
* Post-special Page Storage TDE support
* Direct I/O (developer-only feature)
* Improve doc for autovacuum on partitioned tables
* Set arbitrary GUC options during initdb
* An attempt to avoid
locally-committed-but-not-replicated-to-standby-transactions in
synchronous replication
* Check lateral references within PHVs for memoize cache keys
* monitoring usage count distribution
* Reduce wakeup on idle for bgwriter & walwriter for >5s
* Report the query string that caused a memory error under Valgrind

No emails since February 2023

* New [relation] options engine
* possibility to take name, signature and oid of currently executed
function in GET DIAGNOSTICS statement
* Named Operators
* nbtree performance improvements through specialization on key shape
* Fix assertion failure in SnapBuildInitialSnapshot()
* Speed up releasing of locks
* Improve pg_bsd_indent's handling of multiline initialization expressions
* User functions for building SCRAM secrets
* Refactoring postgres_fdw/connection.c
* Add pg_stat_session
* Doc: Improve note about copying into postgres_fdw foreign tables in batch
* archive modules loose ends
* Fix dsa_free() to re-bin segment
* Reduce timing overhead of EXPLAIN ANALYZE using rdtsc
* clean up permission checks after 599b33b94
* Some revises in adding sorting path
* ResourceOwner refactoring
* Fix the description of GUC "max_locks_per_transaction" and
"max_pred_locks_per_transaction" in guc_table.c
* some namespace.c refactoring
* Add function to_oct
* Switching XLog source from archive to streaming when primary available
* Dynamic result sets from procedures
* BRIN - SK_SEARCHARRAY and scan key preprocessing
* Reuse Workers and Replication Slots during Logical Replication

Re: Kerberos delegation support in libpq and postgres_fdw

2023-03-21 Thread Greg Stark

The CFBot says there's a function be_gssapi_get_proxy() which is
undefined. Presumably this is a missing #ifdef or a definition that
should be outside an #ifdef.

[14:05:21.532] dblink.c: In function ‘dblink_security_check’:
[14:05:21.532] dblink.c:2606:38: error: implicit declaration of
function ‘be_gssapi_get_proxy’ [-Werror=implicit-function-declaration]
[14:05:21.532] 2606 | if (PQconnectionUsedGSSAPI(conn) &&
be_gssapi_get_proxy(MyProcPort))
[14:05:21.532] | ^~~
[14:05:21.532] cc1: all warnings being treated as errors

[13:56:28.789] dblink.c.obj : error LNK2019: unresolved external
symbol be_gssapi_get_proxy referenced in function dblink_connstr_check
[13:56:29.040] contrib\dblink\dblink.dll : fatal error LNK1120: 1
unresolved externals

On Mon, 20 Mar 2023 at 09:30, Stephen Frost  wrote:
>
> Greetings,
>
> * Stephen Frost (sfr...@snowman.net) wrote:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> > > On Mon, Sep 19, 2022 at 02:05:39PM -0700, Jacob Champion wrote:
> > > > It's not prevented, because a password is being used. In my tests I'm
> > > > connecting as an unprivileged user.
> > > >
> > > > You're claiming that the middlebox shouldn't be doing this. If this new
> > > > default behavior were the historical behavior, then I would have agreed.
> > > > But the cat's already out of the bag on that, right? It's safe today.
> > > > And if it's not safe today for some other reason, please share why, and
> > > > maybe I can work on a patch to try to prevent people from doing it.
> > >
> > > Please note that this has been marked as returned with feedback in the
> > > current CF, as this has remained unanswered for a bit more than three
> > > weeks.
> >
> > There's some ongoing discussion about how to handle outbound connections
> > from the server ending up picking up credentials from the server's
> > environment (that really shouldn't be allowed unless specifically asked
> > for..), that's ultimately an independent change from what this patch is
> > doing.
>
> That got committed, which is great, though it didn't go quite as far as
> I had been hoping regarding dealing with outbound connections from the
> server- perhaps we should make it clear at least for postgres_fdw that
> it might be good for administrators to explicitly say which options are
> allowed for a given user-map when it comes to how authentication is
> done to the remote server?  Seems like mostly a documentation
> improvement, I think?  Or should we have some special handling around
> that option for postgres_fdw/dblink?
>
> > Here's an updated version which does address Robert's concerns around
> > having this disabled by default and having options on both the server
> > and client side saying if it is to be enabled or not.  Also added to
> > pg_stat_gssapi a field that indicates if credentials were proxied or not
> > and made some other improvements and added additional regression tests
> > to test out various combinations.
>
> I've done some self-review and also reworked how the security checks are
> done to be sure that we're not ending up pulling credentials from the
> environment (with added regression tests to check for it too).  If
> there's remaining concerns around that, please let me know.  Of course,
> other review would be great also.  Presently though:
>
> - Rebased up to today
> - Requires explicitly being enabled on client and server
> - Authentication to a remote server via dblink or postgres_fdw with
>   GSSAPI requires that credentials were proxied by the client to the
>   server, except if the superuser set 'password_required' to false on
>   the postgres_fdw (which has lots of caveats around it in the
>   documentation because it's inherently un-safe to do).
> - Includes updated documentation
> - Quite a few additional regression tests to check for unrelated
>   credentials coming from the environment in either cases where
>   credentials have been proxied and in cases where they haven't.
> - Only changes to existing regression tests for dblink/postgres_fdw are
>   in the error message wording updates.
>
> Thanks!
>
> Stephen



-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-03-21 Thread Greg Stark

On Tue, 21 Mar 2023 at 05:59, Alvaro Herrera  wrote:
>
> On 2023-Mar-20, Thomas Munro wrote:
>
> This led me to suggesting that perhaps we need to be more lenient when
> it comes to new contributors.  As I said, for seasoned contributors,
> it's not a problem to keep up with our requirements, however silly they
> are.  But people who spend their evenings a whole week or month trying
> to understand how to patch for one thing that they want, to be received
> by six months of silence followed by a constant influx of "please rebase
> please rebase please rebase", no useful feedback, and termination with
> "eh, you haven't rebased for the 1001th time, your patch has been WoA
> for X days, we're setting it RwF, feel free to return next year" ...
> they are most certainly off-put and will *not* try again next year.

I feel like the "no useful feedback" is the real problem though. If
the patches had been reviewed in earlier commitfests the original
contributor would still have been around to finish it... Like, I think
what would actually solve this problem would be if we kept a "clean"
house where patches were committed within one or two commitfests
rather than dragged forward until the final commitfest.

I do agree though. It would be nice if it was easier for anyone to do
trivial merges and update the commitfest entry. That's the kind of
thing gitlab/github are better positioned to solve when they can have
integral editors and built-in CI...

I haven't been RwF or moving to the next commitfest when the merge
looked trivial. And in one case I actually did the merge myself :) But
that only goes so far. If the merge actually requires understanding
the patch in depth then the counter-argument is that the committer
might be spending a lot of time on a patch that won't get committed
while others sit ignored entirely.

--
greg

misplaced GUC in pqcomm.h -- where to put actual common variable though...?

2023-03-21 Thread Greg Stark

So back in 2002 in 7.3 there was a commit 2c6b34d9598 which added a
GUC db_user_namespace which is stored in a variable Db_user_namespace.
All that seems fine except...

The variable this GUC is stored in is Db_user_namespace which... is
actually declared in pqcomm.h which is intended to be "Definitions
common to frontends and backends".

Afaics it's never actually defined in any FE code, neither libpq nor
any clients. I was a bit surprised this isn't producing a warning
about an extern declaration that's never defined but I guess that's
not actually that unusual.

The actual variable is defined in the backend in postmaster.c. I'm
guessing this declaration can just move to libpq/libpq.h which
(counterintuitively) is for the backend iiuc.

I don't think this causes any actual problems aside from namespace
pollution but it confused me.  I found this because I was looking for
where to put the ALPN protocol version which (at least at the moment)
would be the same for the server and client. But as far as I can tell
it would be the only variable (other than the above) declared in both
and that means there's no particularly handy place to put the
definition.

--
greg

Re: Experiments with Postgres and SSL

2023-03-20 Thread Greg Stark

Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case.
From 4b6e01c7f569a919d660cd80ce64cb913bc9f220 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v4 4/4] alpn support

---
 src/backend/libpq/be-secure-openssl.c| 65 
 src/backend/libpq/be-secure.c|  3 ++
 src/backend/postmaster/postmaster.c  | 23 +
 src/bin/psql/command.c   |  7 ++-
 src/include/libpq/libpq-be.h |  1 +
 src/include/libpq/libpq.h|  1 +
 src/interfaces/libpq/fe-connect.c|  5 ++
 src/interfaces/libpq/fe-secure-openssl.c | 25 +
 src/interfaces/libpq/libpq-int.h |  1 +
 9 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..034e1cf2ec 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,11 @@ be_tls_open_server(Port *port)
 	/* set up debugging/info callback */
 	SSL_CTX_set_info_callback(SSL_context, info_cb);
 
+	if (ssl_enable_alpn) {
+		elog(WARNING, "Enabling ALPN Callback");
+		SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port);
+	}
+
 	if (!(port->ssl = SSL_new(SSL_context)))
 	{
 		ereport(COMMERROR,
@@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args)
 	}
 }
 
+static unsigned char alpn_protos[] = {
+	12, 'P','o','s','t','g','r','e','s','/','3','.','0'
+};
+static unsigned int alpn_protos_len = sizeof(alpn_protos);
+
+/*
+ * Server callback for ALPN negotiation. We use use the standard "helper"
+ * function even though currently we only accept one value. We store the
+ * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level
+ * logic (in postmaster.c) to decide what to do with that info.
+ *
+ * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL
+ * callbacks. If we throw an error from here is there a risk of messing up
+ * OpenSSL data structures? It's possible it's ok becuase this is only called
+ * during connection negotiation which we don't try to recover from.
+ */
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata)
+{
+	/* why does OpenSSL provide a helper function that requires a nonconst
+	 * vector when the callback is declared to take a const vector? What are
+	 * we to do with that?*/
+	int retval;
+	Assert(userdata != NULL);
+	Assert(out != NULL);
+	Assert(outlen != NULL);
+	Assert(in != NULL);
+
+	retval = SSL_select_next_proto((unsigned char **)out, outlen,
+   alpn_protos, alpn_protos_len,
+   in, inlen);
+	if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0)
+		elog(ERROR, "SSL_select_next_proto returned nonsensical results");
+
+	if (retval == OPENSSL_NPN_NEGOTIATED)
+	{
+		struct Port *port = (struct Port *)userdata;
+		
+		port->ssl_alpn_protocol = palloc0(*outlen+1);
+		/* SSL uses unsigned chars but Postgres uses host signedness of chars */
+		strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen);
+		return SSL_TLSEXT_ERR_OK;
+	} else if (retval == OPENSSL_NPN_NO_OVERLAP) {
+		return SSL_TLSEXT_ERR_NOACK;
+	} else {
+		return SSL_TLSEXT_ERR_NOACK;
+	}
+}
+
+
 /*
  * Set DH parameters for generating ephemeral DH keys.  The
  * DH parameters can take a long time to compute, so they must be
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 1020b3adb0..79a61900ba 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -61,6 +61,9 @@ bool		SSLPreferServerCiphers;
 int			ssl_min_protocol_version = PG_TLS1_2_VERSION;
 int			ssl_max_protocol_version = PG_TLS_ANY;
 
+/* GUC variable: if false ignore ALPN negotiation */
+bool		ssl_enable_alpn = true;
+
 /*  */
 /*			 Procedures common to all secure sessions			*/
 /*  */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ec1d895a23..2640b69fed 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/po

Re: Experiments with Postgres and SSL

2023-03-20 Thread Greg Stark

Here's a first cut at ALPN support.

Currently it's using a hard coded "Postgres/3.0" protocol (hard coded
both in the client and the server...). And it's hard coded to be
required for direct connections and supported but not required for
regular connections.

IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.

The other patches are unchanged (modulo a free() that I missed in the
client before). They still have the semi-open issues I mentioned in
the previous email.




--
greg
From 32185020927824c4b57af900100a37f92ae6a040 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v3 4/4] alpn support

---
 src/backend/libpq/be-secure-openssl.c| 65 
 src/backend/libpq/be-secure.c|  3 ++
 src/backend/postmaster/postmaster.c  | 23 +
 src/bin/psql/command.c   |  7 ++-
 src/include/libpq/libpq-be.h |  1 +
 src/include/libpq/libpq.h|  1 +
 src/interfaces/libpq/fe-connect.c|  5 ++
 src/interfaces/libpq/fe-secure-openssl.c | 25 +
 src/interfaces/libpq/libpq-int.h |  1 +
 9 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..034e1cf2ec 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,11 @@ be_tls_open_server(Port *port)
 	/* set up debugging/info callback */
 	SSL_CTX_set_info_callback(SSL_context, info_cb);
 
+	if (ssl_enable_alpn) {
+		elog(WARNING, "Enabling ALPN Callback");
+		SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port);
+	}
+
 	if (!(port->ssl = SSL_new(SSL_context)))
 	{
 		ereport(COMMERROR,
@@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args)
 	}
 }
 
+static unsigned char alpn_protos[] = {
+	12, 'P','o','s','t','g','r','e','s','/','3','.','0'
+};
+static unsigned int alpn_protos_len = sizeof(alpn_protos);
+
+/*
+ * Server callback for ALPN negotiation. We use use the standard "helper"
+ * function even though currently we only accept one value. We store the
+ * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level
+ * logic (in postmaster.c) to decide what to do with that info.
+ *
+ * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL
+ * callbacks. If we throw an error from here is there a risk of messing up
+ * OpenSSL data structures? It's possible it's ok becuase this is only called
+ * during connection negotiation which we don't try to recover from.
+ */
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata)
+{
+	/* why does OpenSSL provide a helper function that requires a nonconst
+	 * vector when the callback is declared to take a const vector? What are
+	 * we to do with that?*/
+	int retval;
+	Assert(userdata != NULL);
+	Assert(out != NULL);
+	Assert(outlen != NULL);
+	Assert(in != NULL);
+
+	retval = SSL_select_next_proto((unsigned char **)out, outlen,
+   alpn_protos, alpn_protos_len,
+   in, inlen);
+	if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0)
+		elog(ERROR, "SSL_select_next_proto returned nonsensical results");
+
+	if (retval == OPENSSL_NPN_NEGOTIATED)
+	{
+		struct Port *port = (struct Port *)userdata;
+		
+		port->ssl_alpn_protocol = palloc0(*outlen+1);
+		/* SSL uses unsigned chars but Postgres uses host signedness of chars */
+		strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen);
+		return SSL_TLSEXT_ERR_OK;
+	} else if (retval == OPENSSL_NPN_NO_OVERLAP) {
+		return SSL_TLSEXT_ERR_NOACK;
+	} else {
+		return SSL_TLSEXT_ERR_NOACK;
+	}
+}
+
+
 /*
  * Set DH parameters for generating ephemeral DH keys.  The
  * DH parameters can take a long time to compute, so they must be
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 1020b3adb0..79a61900ba 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/li

Re: Commitfest 2023-03 starting tomorrow!

2023-03-20 Thread Greg Stark

On Mon, 20 Mar 2023 at 16:08, Thomas Munro  wrote:
>
> On Tue, Mar 21, 2023 at 3:15 AM Greg Stark  wrote:
> > The next level of this would be something like notifying the committer
> > with a list of patches in the CF that a commit broke. I don't
> > immediately see how to integrate that with our workflow but I have
> > seen something like this work well in a previous job. When committing
> > code you often went and updated other unrelated projects to adapt to
> > the new API (or could adjust the code you were committing to cause
> > less breakage).
>
> I've been hesitant to make it send email.  The most obvious message to
> send would be "hello, you posted a patch, but it fails on CI" to the
> submitter.

Yeah, even aside from flappers there's the problem that it's about as
common for real commits to break some test as it is for patches to
start failing tests. So often it's a real failure but it's nothing to
do with the patch, it has to do with the commit that went into git.

What I'm interested in is something like "hey, your commit caused 17
patches to start failing tests". And it doesn't necessarily have to be
an email. Having a historical page so when I look at a patch I can go
check, "hey did this start failing at the same time as 17 other
patches on the same commit?" would be the same question.

-- 
greg

Re: Make ON_ERROR_STOP stop on shell script failure

2023-03-20 Thread Greg Stark

So I took a look at this patch. The conflict is with 2fe3bdbd691
committed by Peter Eisentraut which added error checks for pipes.
Afaics the behaviour is now for pipe commands returning non-zero to
cause an error *always* regardless of the setting of ON_ERROR_STOP.

I'm not entirely sure that's sensible actually. If you write to a pipe
that ends in grep and it happens to produce no matching rows you may
actually be quite surprised when that causes your script to fail...

But if you remove that failing hunk the resulting patch does apply. I
don't see any tests so ... I don't know if the behaviour is still
sensible. A quick read gives me the impression that now it's actually
inconsistent in the other direction where it stops sometimes more
often than the user might expect.

I also don't understand the difference between ON_ERROR_STOP_ON and
ON_ERROR_STOP_ALL. Nor why we would want ON_ERROR_STOP_SHELL which
stops only on shell errors, rather than, say, ON_ERROR_STOP_SQL to do
the converse which would at least match the historical behaviour?
From 1f0feb9daf106721cb7fcba39ef7d663178f8ed1 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:25:22 -0400
Subject: [PATCH v5] on error stop for shell

---
 doc/src/sgml/ref/psql-ref.sgml |  7 +++-
 src/bin/psql/command.c | 20 ++--
 src/bin/psql/common.c  | 58 +++---
 src/bin/psql/psqlscanslash.l   | 29 +++--
 src/bin/psql/settings.h| 10 +-
 src/bin/psql/startup.c | 29 +
 6 files changed, 134 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..856bb5716f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4194,7 +4194,12 @@ bar
 
 By default, command processing continues after an error.  When this
 variable is set to on, processing will instead stop
-immediately.  In interactive mode,
+immediately upon an error in SQL query. When this variable is set to
+shell, a nonzero exit status of a shell command,
+which indicates failure, is interpreted as an error that stops the processing.
+When this variable is set to all, errors from both
+SQL queries and shell commands can stop the processing.
+In interactive mode,
 psql will return to the command prompt;
 otherwise, psql will exit, returning
 error code 3 to distinguish this case from fatal error
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 61ec049f05..9fbf2522ea 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -34,6 +34,7 @@
 #include "describe.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/print.h"
+#include "fe_utils/psqlscan_int.h"
 #include "fe_utils/string_utils.h"
 #include "help.h"
 #include "input.h"
@@ -2394,9 +2395,13 @@ exec_command_set(PsqlScanState scan_state, bool active_branch)
 			 */
 			char	   *newval;
 			char	   *opt;
+			PQExpBuffer output_buf = scan_state->output_buf;
 
 			opt = psql_scan_slash_option(scan_state,
 		 OT_NORMAL, NULL, false);
+			if (output_buf->len >= output_buf->maxlen
+&& (pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL))
+success = false;
 			newval = pg_strdup(opt ? opt : "");
 			free(opt);
 
@@ -5041,10 +5046,19 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
-	if (result == 127 || result == -1)
+	if (result != 0)
 	{
-		pg_log_error("\\!: failed");
-		return false;
+		if (result < 0)
+			pg_log_error("could not execute command: %m");
+		else
+		{
+			char *reason = wait_result_to_str(result);
+
+			pg_log_error("%s: %s", command, reason ? reason : "");
+			free(reason);
+		}
+		if (pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL)
+			return false;
 	}
 	return true;
 }
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index f907f5d4e8..c87090a75d 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -94,6 +94,7 @@ setQFout(const char *fname)
 {
 	FILE	   *fout;
 	bool		is_pipe;
+	bool		status = true;
 
 	/* First make sure we can open the new output file/pipe */
 	if (!openQueryOutputFile(fname, , _pipe))
@@ -103,7 +104,24 @@ setQFout(const char *fname)
 	if (pset.queryFout && pset.queryFout != stdout && pset.queryFout != stderr)
 	{
 		if (pset.queryFoutPipe)
-			pclose(pset.queryFout);
+		{
+			int pclose_rc = pclose(pset.queryFout);
+			if (pclose_rc != 0)
+			{
+if (pclose_rc < 0)
+	pg_log_error("could not close pipe to external command: %m");
+else
+{
+	char *reason = wait_result_to_str(pclose_rc);
+
+	pg_log_error("command failure: %s",
+ reason ?

Re: Commitfest 2023-03 starting tomorrow!

2023-03-20 Thread Greg Stark

The next level of this would be something like notifying the committer
with a list of patches in the CF that a commit broke. I don't
immediately see how to integrate that with our workflow but I have
seen something like this work well in a previous job. When committing
code you often went and updated other unrelated projects to adapt to
the new API (or could adjust the code you were committing to cause
less breakage).

Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Greg Stark

On Fri, 17 Mar 2023 at 10:39, Tom Lane  wrote:
>
> You've listed a lot of small features here that still have time to
> get some love --- it's not like we're hard up against the end of the CF.
> If they'd been in Waiting on Author state for awhile, I'd agree with
> booting them, but not when they're in Needs Review.

Oh, that's exactly my intent -- when I listed them two days ago it was
a list of Waiting on Author patches without updates since March 1. But
I didn't recheck them this morning yet.

If they've gotten comments in the last two days or had their status
updated then great. It's also possible there are threads that aren't
attached to the commitfest or are attached to a related patch that I
may not be aware of.

-- 
greg

Re: Commitfest 2023-03 starting tomorrow!

2023-03-17 Thread Greg Stark

On Wed, 15 Mar 2023 at 14:29, Gregory Stark (as CFM)
 wrote:
>
> These patches that are "Needs Review" and have received no comments at
> all since before March 1st are these. If your patch is amongst this
> list I would suggest any of:
>
> 1) Move it yourself to the next CF (or withdraw it)
> 2) Post to the list with any pending questions asking for specific
> feedback -- it's much more likely to get feedback than just a generic
> "here's a patch plz review"...
> 3) Mark it Ready for Committer and possibly post explaining the
> resolution to any earlier questions to make it easier for a committer
> to understand the state
>
> If there's still no emails on these at some point I suppose it will
> make sense to move them out of the CF.

I'm going to go ahead and do this today. Any of these patches that are
"Waiting on Author" and haven't received any emails or status changes
since March 1 I'm going to move out of the commitfest(*). If you
really think your patch in this list is important to get committed
then please respond to the thread explaining any plans or feedback
needed.

It would be nice to actually do Returned With Feedback where
appropriate but there are too many to go through them thoroughly. I'll
only be able to do a quick review of each thread checking for
important bug fixes or obviously rejected patches.

(*) I reserve the right to skip and leave some patches where
appropriate. In particular I'll skip patches that are actually from
committers on the theory that they could just commit them when they
feel like it anyways. Some patches may be intentionally waiting until
the end of the release cycle to avoid conflicts too.

>  * ALTER TABLE SET ACCESS METHOD on partitioned tables
>  * New hooks in the connection path
>  * Add log messages when replication slots become active and inactive
>  * Avoid use deprecated Windows Memory API
>  * Remove dead macro exec_subplan_get_plan
>  * Consider parallel for LATERAL subqueries having LIMIT/OFFSET
>  * pg_rewind WAL deletion pitfall
>  * Simplify find_my_exec by using realpath(3)
>  * Move backup-related code to xlogbackup.c/.h
>  * Avoid hiding shared filesets in pg_ls_tmpdir (pg_ls_* functions for
> showing metadata ...)
>  * Fix bogus error emitted by pg_recvlogical when interrupted
>  * warn if GUC set to an invalid shared library
>  * Check consistency of GUC defaults between .sample.conf and
> pg_settings.boot_val
>  * Code checks for App Devs, using new options for transaction behavior
>  * Lockless queue of waiters based on atomic operations for LWLock
>  * Fix assertion failure with next_phase_at in snapbuild.c
>  * Add SPLIT PARTITION/MERGE PARTITIONS commands
>  * Add sortsupport for range types and btree_gist
>  * asynchronous execution support for Custom Scan
>  * Periodic burst growth of the checkpoint_req counter on replica.
>  * CREATE INDEX CONCURRENTLY on partitioned table
>  * Fix ParamPathInfo for union-all AppendPath
>  * Add OR REPLACE option for CREATE OPERATOR
>  * ALTER TABLE and CLUSTER fail to use a BulkInsertState for toast tables
>  * Partial aggregates push down
>  * Non-replayable WAL records through overflows and >MaxAllocSize lengths
>  * Enable jitlink as an alternative jit linker of legacy Rtdyld and
> add riscv jitting support
>  * Test for function error in logrep worker
>  * basebackup: support zstd long distance matching
>  * pgbench - adding pl/pgsql versions of tests
>  * Function to log backtrace of postgres processes
>  * More scalable multixacts buffers and locking
>  * Remove nonmeaningful prefixes in PgStat_* fields
>  * COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns
>  * postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
>  * Add semi-join pushdown to postgres_fdw
>  * Skip replicating the tables specified in except table option
>  * Split index and table statistics into different types of stats
>  * Exclusion constraints on partitioned tables
>  * Post-special Page Storage TDE support
>  * Direct I/O (developer-only feature)
>  * Improve doc for autovacuum on partitioned tables
>  * Patch to implement missing join selectivity estimation for range types
>  * Clarify the behavior of the system when approaching XID wraparound
>  * Set arbitrary GUC options during initdb
>  * An attempt to avoid
> locally-committed-but-not-replicated-to-standby-transactions in
> synchronous replication
>  * Check lateral references within PHVs for memoize cache keys
>  * Add n_tup_newpage_upd to pg_stat table views
>  * monitoring usage count distribution
>  * Reduce wakeup on idle for bgwriter & walwriter for >5s
>  * Report the query string that caused a memory error under Valgrind
>  * New [relation] options engine
>  * Data is copied twice when specifying both child and parent table in
> publication
>  * possibility to take name, signature and oid of currently executed
> function in GET DIAGNOSTICS statement
>  * Named Operators
>  * nbtree performance improvements through specialization

Re: Experiments with Postgres and SSL

2023-03-16 Thread Greg Stark

Here's an updated patch for direct SSL connections.

I've added libpq client support with a new connection parameter. This
allows testing it easily with psql. It's still a bit hard to see
what's going on though. I'm thinking it would be good to have libpq
keep a string which describes what negotiations were attempted and
failed and what was eventually accepted which psql could print with
the SSL message or expose some other way.

In the end I didn't see how adding an API for this really helped any
more than just saying the API is to stuff the unread data into the
Port structure. So I just documented that. If anyone has any better
idea...

I added documentation for the libpq connection setting.

One thing, I *think* it's ok to replace the send(2) call with
secure_write() in the negotiation. It does mean it's possible for the
connection to fail with FATAL at that point instead of COMMERROR but I
don't think that's a problem.

I haven't added tests. I'm not sure how to test this since to test it
properly means running the server with every permutation of ssl and
gssapi configurations.

Incidentally, some of the configuration combinations -- namely
sslnegotiation=direct and default gssencmode and sslmode results in a
counter-intuitive behaviour. But I don't see a better option that
doesn't mean making the defaults less useful.
From b07e19223bee52b7bb9b50afea39e4baaa0a46f3 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 16 Mar 2023 15:10:15 -0400
Subject: [PATCH v2 3/3] Direct SSL connections documentation

---
 doc/src/sgml/libpq.sgml | 102 
 1 file changed, 93 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3706d349ab..e2f0891ea5 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1701,10 +1701,13 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 Note that if GSSAPI encryption is possible,
 that will be used in preference to SSL
 encryption, regardless of the value of sslmode.
-To force use of SSL encryption in an
-environment that has working GSSAPI
-infrastructure (such as a Kerberos server), also
-set gssencmode to disable.
+To negotiate SSL encryption in an environment that
+has working GSSAPI infrastructure (such as a
+Kerberos server), also set gssencmode
+to disable.
+Use of non-default values of sslnegotiation can
+also cause SSL to be used instead of
+negotiating GSSAPI encryption.

   
  
@@ -1731,6 +1734,75 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  sslnegotiation
+  
+   
+This option controls whether PostgreSQL
+will perform its protocol negotiation to request encryption from the
+server or will just directly make a standard SSL
+connection.  Traditional PostgreSQL
+protocol negotiation is the default and the most flexible with
+different server configurations. If the server is known to support
+direct SSL connections then the latter requires one
+fewer round trip reducing connection latency and also allows the use
+of protocol agnostic SSL network tools.
+   
+
+
+ 
+  postgres
+  
+   
+ perform PostgreSQL protocol
+ negotiation. This is the default if the option is not provided.
+   
+  
+ 
+
+ 
+  direct
+  
+   
+ first attempt to establish a standard SSL connection and if that
+ fails reconnect and perform the negotiation. This fallback
+ process adds significant latency if the initial SSL connection
+ fails.
+   
+  
+ 
+
+ 
+  requiredirect
+  
+   
+ attempt to establish a standard SSL connection and if that fails
+ return a connection failure immediately.
+   
+  
+ 
+
+
+   
+If sslmode set to disable
+or allow then sslnegotiation is
+ignored. If gssencmode is set
+to require then sslnegotiation
+must be the default postgres value.
+   
+
+   
+Moreover, note if gssencmode is set
+to prefer and sslnegotiation
+to direct then the effective preference will be
+direct SSL connections, followed by
+negotiated GSS connections, followed by
+negotiated SSL connections, possibly followed
+lastly by unencrypted connections.
+   
+  
+ 
+
  
   sslcompression
   
@@ -1884,11 +1956,13 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 

 The Server Name Indication can be used by SSL-aware proxies to route
-connections without having to decrypt the SSL stream.  (Note that this
-requires a proxy

Default libpq connection parameter handling and copy-paste of apparently dead code for it?

2023-03-16 Thread Greg Stark

I notice a number of places in fe-connect.c have copied this idiom
where if an option is present they validate the legal options and
otherwise they strdup a default value. This strdup of the default
option I think is being copied from sslmode's validation which is a
bit special but afaics the following still applies to it.

   /*
* validate channel_binding option
*/
   if (conn->channel_binding)
   {
   if (strcmp(conn->channel_binding, "disable") != 0
   && strcmp(conn->channel_binding, "prefer") != 0
   && strcmp(conn->channel_binding, "require") != 0)
   {
   conn->status = CONNECTION_BAD;
   libpq_append_conn_error(conn, "invalid %s value: \"%s\"",

"channel_binding", conn->channel_binding);
   return false;
   }
   }
   else
   {
   conn->channel_binding = strdup(DefaultChannelBinding);
   if (!conn->channel_binding)
   goto oom_error;
   }

AFAICS the else branch of this is entirely dead code. These options
cannot be NULL because the default option is present in the
PQconninfoOptions array as the "compiled in default value" which
conninfo_add_defaults() will strdup in for us.

Unless. conninfo_array_parse() is passed use_defaults=false in
which case no... But why does this parameter exist? This is a static
function with one call site and this parameter is passed as true at
that call site.

So I think this is just dead from some no longer extant code path
that's being copied by new parameters that are added.

As an aside conninfo_add_defaults doesn't put the default value there
if the option is an empty string. I think we should make it do that,
effectively forcing all options to treat empty strings as missing
options. Otherwise it's annoying to use environment variables when you
want to explicitly set a parameter to a default value since it's much
less convenient to "remove" an environment variable in a shell than
pass it as an empty string. And it would just be confusing to have
empty strings behave differently from omitted parameters.


-- 
greg

Re: Sampling-based timing for EXPLAIN ANALYZE

2023-03-14 Thread Greg Stark

On Tue, 17 Jan 2023 at 14:52, Tomas Vondra
 wrote:
>
> On 1/17/23 19:46, Andres Freund wrote:
>
> > I think a "hybrid" explain mode might be worth thinking about. Use the
> > "current" sampling method for the first execution of a node, and for the 
> > first
> > few milliseconds of a query (or perhaps the first few timestamp
> > acquisitions). That provides an accurate explain analyze for short queries,
> > without a significant slowdown. Then switch to sampling, which provides 
> > decent
> > attribution for a bit longer running queries.
> >
>
> Yeah, this is essentially the sampling I imagined when I first read the
> subject of this thread. It samples which node executions to measure (and
> then measures those accurately), while these patches sample timestamps.

That sounds interesting. Fwiw my first thought would be to  implement
it  a  bit differently. Always have a timer running sampling right
from the start, but also if there are less than, say, 1000 samples for
a node then measure the actual start/finish time.

So for any given node once you've hit enough samples to get a decent
estimate you stop checking the time. That way any fast or rarely
called nodes still have accurate measurements even if they get few
samples and any long or frequently called nodes stop getting
timestamps and just use timer counts.

-- 
greg

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2023-03-14 Thread Greg Stark

On Tue, 14 Mar 2023 at 13:59, Tom Lane  wrote:
>
> "Gregory Stark (as CFM)"  writes:
> > It looks like this needs a big rebase in fea-uth.c fe-auth-scram.c and
> > fe-connect.c. Every hunk is failing which perhaps means the code
> > you're patching has been moved or refactored?
>
> The cfbot is giving up after
> v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch fails,
> but that's been superseded (at least in part) by b6dfee28f.

Ah, same with Jelte Fennema's patch for load balancing in libpq.

-- 
greg

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-03-01 Thread Greg Stark

On Tue, 28 Feb 2023 at 05:13, Kartyshov Ivan  wrote:
>
> Below I provide the implementation of patches for the first three types.
> I propose to discuss this feature again/

Oof, that doesn't really work with the cfbot. It tries to apply all
three patches and of course the second and third fail to apply.

In any case this seems like a lot of effort to me. I would suggest you
just pick one avenue and provide that patch for discussion and just
ask whether people would prefer any of the alternative syntaxes.

Fwiw I prefer the functions approach. I do like me some nice syntax
but I don't see any particular advantage of the special syntax in this
case. They don't seem to provide any additional expressiveness.

That said, I'm not a fan of the specific function names. Remember that
we have polymorphic functions so you could probably just have an
option argument:

pg_lsn_wait('LSN', [timeout]) returns boolean

(just call it with a timeout of 0 to do a no-wait)

I'll set the patch to "Waiting on Author" for now. If you feel you're
still looking for more opinions from others maybe set it back to Needs
Review but honestly there are a lot of patches so you probably won't
see much this commitfest unless you have a patch that shows in
cfbot.cputube.org as applying and which looks ready to commit.

-- 
greg

Commitfest 2023-03 starting tomorrow!

2023-02-28 Thread Greg Stark

So I'm not sure if I'll be CFM this month but I'm assuming I will be
at this point

Regardless, Commitfest 2023-03 starts tomorrow!

So this is a good time to check your submitted patches and ensure
they're actually in building and don't need a rebase. Take a look at
http://cfbot.cputube.org/ for example. I'll do an initial pass marking
anything red here as Waiting on Author

The next pass would be to grab any patches not marked Ready for
Committer and if they look like they'll need more than a one round of
feedback and a couple weeks to polish they'll probably get bounced to
the next commitfest too. It sucks not getting feedback on your patches
for so long but there are really just sooo many patches and so few
eyeballs... It would be great if people could do initial reviews of
these patches before we bounce them because it really is discouraging
for developers to send patches and not get feedback. But realistically
it's going to happen to a lot of patches.


-- 
greg

Re: Experiments with Postgres and SSL

2023-02-28 Thread Greg Stark

On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas  wrote:
>
> On 20/01/2023 03:28, Jacob Champion wrote:
> > On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:
> >> * "Service Mesh" type tools that hide multiple services behind a
> >> single host/port ("Service Mesh" is just a new buzzword for "proxy").
> >
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

I had never heard of this before, it does seem useful. But if I
understand it right it's entirely independent of this patch. We can
add it to all our Client/Server exchanges whether they're the initial
direct SSL connection or the STARTTLS negotiation?

> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.
>
> >> Incidentally I find the logic in ProcessStartupPacket incredibly
> >> confusing. It took me a while before I realized it's using tail
> >> recursion to implement the startup logic. I think it would be way more
> >> straightforward and extensible if it used a much more common iterative
> >> style. I think it would make it possible to keep more state than just
> >> ssl_done and gss_done without changing the function signature every
> >> time for example.
> >
> > +1. The complexity of the startup logic, both client- and server-side,
> > is a big reason why I want implicit TLS in the first place. That way,
> > bugs in that code can't be exploited before the TLS handshake
> > completes.
>
> +1. We need to support explicit TLS for a long time, so we can't
> simplify by just removing it. But let's refactor the code somehow, to
> make it more clear.
>
> Looking at the patch, I think it accepts an SSLRequest packet even if
> implicit TLS has already been established. That's surely wrong, and
> shows how confusing the code is. (Or I'm reading it incorrectly, which
> also shows how confusing it is :-) )

I'll double check it but I think I tested that that wasn't the case. I
think it accepts the SSL request packet and sends back an N which the
client libpq just interprets as the server not supporting SSL and does
an unencrypted connection (which is tunneled over stunnel unbeknownst
to libpq).

I agree I would want to flatten this logic to an iterative approach
but having wrapped my head around it now I'm not necessarily rushing
to do it now. The main advantage of flattening it would be to make it
easy to support other protocol types which I think could be really
interesting. It would be much clearer to document the state machine if
all the state is in one place and the code just loops through
processing startup packets and going to a new state until the
connection is established. That's true now but you have to understand
how the state is passed in the function parameters and notice that all
the recursion is tail recursive (I think). And extending that state
would require extending the function signature which would get awkward
quickly.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it
>
> - First connect in old-fashioned way, and remember the server version.
> Later, if you reconnect to the same server, use implicit TLS if the
> server version was high enough. This would be most useful for connection
> pools.

Vladimir pointed out that this doesn't necessarily work. The server
may be new enough to support it but it could be behind a proxy like
pgbouncer or something. The same would be true if the server reported
a "connection option" instead of depending on version.

> - Connect both ways at the same time, and continue with the fastest,
> i.e. "happy eyeballs"

That seems way too complex for us to bother with imho.

> - Try implicit TLS first, and fall back to explicit TLS if it fails.

> For libpq, we don't necessarily need to do anything right now. We can
> add the implicit TLS support in a later version. Not having libpq
> support makes i

Re: Commitfest Manager

2023-02-21 Thread Greg Stark

On Tue, 21 Feb 2023 at 12:29, Tom Lane  wrote:
>
> Greg Stark  writes:
> > Is anyone else itching to be CF manager for March? If anyone new wants
> > to try it out that would be good.
>
> > Assuming otherwise I'll volunteer.
>
> We've generally thought that the manager for the last CF of a cycle
> needs to be someone with experience.

Hm. Was that in response to the first paragraph or the second? :)

I have experience in years but only limited experience as a commitfest
manager. I'm still up for it if you think I'm ready :)

-- 
greg

Commitfest Manager

2023-02-21 Thread Greg Stark

Is anyone else itching to be CF manager for March? If anyone new wants
to try it out that would be good.

Assuming otherwise I'll volunteer.

-- 
greg

Re: Temporary tables versus wraparound... again

2023-02-04 Thread Greg Stark

I think that was spurious. It looked good when we looked at it yesterday.
The rest that failed seemed unrelated and was also taking on my SSL patch
too.

I talked to Andres about the possibility of torn reads in the pg_class
stats but those are all 4-byte columns so probably safe. And in any case
that's a pre-existing possibility just more likely (if it's possible at
all) by frequent truncates.

On Thu, Feb 2, 2023, 15:47 Alvaro Herrera  wrote:

> On 2022-Dec-13, Greg Stark wrote:
>
> > So here I've done it that way. It is a bit of an unfortunate layering
> > since it means the heapam_handler is doing the catalog change but it
> > does seem inconvenient to pass relfrozenxid etc back up and have the
> > caller make the changes when there are no other changes to make.
>
> Are you still at this?  CFbot says the meson tests failed last time for
> some reason:
> http://commitfest.cputube.org/greg-stark.html
>
> --
> Álvaro Herrera   48°01'N 7°57'E  —
> https://www.EnterpriseDB.com/
>

Re: Timeline ID hexadecimal format

2023-01-31 Thread Greg Stark

I actually find it kind of annoying that we use hex strings for a lot
of things where they don't add any value. Namely Transaction ID and
LSNs. As a result it's always a bit of a pain to ingest these in other
tools or do arithmetic on them. Neither is referring to memory or
anything where powers of 2 are significant so it really doesn't buy
anything in making them easier to interpret either.

I don't see any advantage in converting every place where we refer to
timelines into hex and then having to refer to things like timeline
1A. It doesn't seem any more intuitive to someone understanding what's
going on than referring to timeline 26.

The fact that the *filename* has it encoded in hex is an
implementation detail and really gets exposed here because it's giving
you the underlying system error that caused the problem. The confusion
only arises when the two are juxtaposed. A hint or something just in
that case might be enough?

Re: Unicode grapheme clusters

2023-01-24 Thread Greg Stark

On Sat, 21 Jan 2023 at 13:17, Tom Lane  wrote:
>
> Probably our long-term answer is to avoid depending on wcwidth
> and use wcswidth instead.  But it's hard to get excited about
> doing the legwork for that until popular libc implementations
> get it right.

Here's an interesting blog post about trying to do this in Rust:

https://tomdebruijn.com/posts/rust-string-length-width-calculations/

TL;DR... Even counting the number of graphemes isn't enough because
terminals typically (but not always) display emoji graphemes using two
columns.

At the end of the day Unicode kind of assumes a variable-width display
where the rendering is handled by something that has access to the
actual font metrics. So anything trying to line things up in columns
in a way that works with any rendering system down the line using any
font is going to be making a best guess.

-- 
greg

Re: Unicode grapheme clusters

2023-01-21 Thread Greg Stark

On Fri, 20 Jan 2023 at 00:07, Pavel Stehule  wrote:
>
> I partially watch an progres in VTE - one of the widely used terminal libs, 
> and I am very sceptical so there will be support in the next two years.
>
> Maybe the new microsoft terminal will give this area a new dynamic, but 
> currently only few people on the planet are working on fixing or enhancing 
> terminal's technologies. Unfortunately there is too much historical balast.

Fwiw this isn't really about terminal emulators. psql is also used to
generate text files for reports or for display in various ways.

I think it's worth using whatever APIs we have available to implement
better alignment for grapheme clusters and just assume whatever will
eventually be used to display the output will display it "properly".

I do not think it's worth trying to implement this ourselves if the
libraries aren't there yet. And I don't think it's worth trying to
adapt to the current state of the current terminal. We don't know that
that's the only place the output will be viewed and it'll all be
wasted effort when the terminals eventually implement full support.

(If we were really crazy about this we could use terminal escape codes
to query the current cursor position after emitting multicharacter
graphemes. But as I said, I don't even think that would be useful,
even if there weren't other reasons it would be a bad idea)

-- 
greg

Re: Experiments with Postgres and SSL

2023-01-20 Thread Greg Stark

On Fri, 20 Jan 2023 at 01:41, Vladimir Sitnikov
 wrote:
>
> Do you think the server can de-support the old code path soon?

I don't have any intention to de-support anything. I really only
picture it being an option in environments where the client and server
are all part of a stack controlled by a single group. User tools and
general purpose tools are better served by our current more flexible
setup.

> Just wondering: do you consider back-porting the feature to all supported DB 
> versions?

I can't see that, no.

> > In practice though, by the time drivers support this it'll probably be
> > far enough in the future
>
> I think drivers release more often than the database, and we can get driver 
> support even before the database releases.
> I'm from pgjdbc Java driver team, and I think it is unfair to suggest that 
> "driver support is only far enough in the future".

Interesting. I didn't realize this would be so attractive to regular
driver authors. I did think of the Happy Eyeballs technique too but I
agree I wouldn't want to go that way either :)

I guess the server doesn't really have to do anything specific to do
what you want. You could just hard code that servers newer than a
specific version would have this support. Or it could be done with a
"protocol option" -- which wouldn't actually change any behaviour but
would be rejected if the server doesn't support "fast ssl" giving you
the feedback you expect without having much extra legacy complexity.

I guess a lot depends on the way the driver works and the way the
application is structured. Applications that make a single connection
or don't have shared state across connections wouldn't think this way.
And interfaces like libpq would normally just leave it up to the
application to make choices like this. But I guess JVM based
applications are more likely to have long-lived systems that make many
connections and also more likely to make it the driver's
responsibility to manage such things.

--
greg

Re: Unicode grapheme clusters

2023-01-19 Thread Greg Stark

This is how we've always documented it. Postgres treats code points as
"characters" not graphemes.

You don't need to go to anything as esoteric as emojis to see this either.
Accented characters like é have no canonical forms that are multiple code
points and in some character sets some accented characters can only be
represented that way.

But I don't think there's any reason to consider changing e existing
functions. They have to be consistent with substr and the other string
manipulation functions.

We could add new functions to work with graphemes but it might bring more
pain keeping it up to date

Re: Experiments with Postgres and SSL

2023-01-19 Thread Greg Stark

On Thu, 19 Jan 2023 at 15:49, Vladimir Sitnikov
 wrote:
>
> What if the server that supports 'fast TLS' added an extra notification in 
> case client connects with a classic TLS?
> Then a capable client could remember host:port and try with newer TLS appoach 
> the next time it connects.
>
> It would be transparent to the clients, and the users won't need to configure 
> 'prefer classic or fast TLS'
> The old clients could discard the notification.

Hm. I hadn't really thought about the case of a new client connecting
to an old server. I don't think it's worth implementing a code path in
the server like this as it would then become cruft that would be hard
to ever get rid of.

I think you can do the same thing, more or less, in the client. Like
if the driver tries to connect via SSL and gets an error it remembers
that host/port and connects using negotiation in the future.

In practice though, by the time drivers support this it'll probably be
far enough in the future that they can just enable it and you can
disable it if you're connecting to an old server. The main benefit for
the near term is going to be clients that are specifically designed to
take advantage of it because it's necessary to enable the environment
they need -- like monitoring tools and proxies.

I've attached the POC. It's not near committable, mainly because of
the lack of any proper interface to the added fields in Port. I
actually had a whole API but ripped it out while debugging because it
wasn't working out.

But here's an example of psql connecting to the same server via
negotiated SSL or through stunnel where stunnel establishes the SSL
connection and psql is just doing plain text:

stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:9432/postgres'
psql (16devel)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384,
compression: off)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
---+-+-++--+---+---+---
 48771 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |   |
|
(1 row)

postgres=# \q
stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:8999/postgres'
psql (16devel)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
---+-+-++--+---+---+---
 48797 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |   |
|
(1 row)

-- 
greg
From 4508f872720a0977cf00041a865d76a4d5f77028 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Wed, 18 Jan 2023 15:34:34 -0500
Subject: [PATCH v1] Direct SSL Connections

---
 src/backend/libpq/be-secure.c   |  13 +++
 src/backend/libpq/pqcomm.c  |   9 +-
 src/backend/postmaster/postmaster.c | 140 ++--
 src/include/libpq/libpq-be.h|   3 +
 src/include/libpq/libpq.h   |   2 +-
 5 files changed, 134 insertions(+), 33 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index a0f7084018..39366d04dd 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -235,6 +235,19 @@ secure_raw_read(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;

+	/* XXX feed the raw_buf into SSL */
+	if (port->raw_buf_remaining > 0)
+	{
+		/* consume up to len bytes from the raw_buf */
+		if (len > port->raw_buf_remaining)
+			len = port->raw_buf_remaining;
+		Assert(port->raw_buf);
+		memcpy(ptr, port->raw_buf + port->raw_buf_consumed, len);
+		port->raw_buf_consumed += len;
+		port->raw_buf_remaining -= len;
+		return len;
+	}
+
 	/*
 	 * Try to read from the socket without blocking. If it succeeds we're
 	 * done, otherwise we'll wait for the socket using the latch mechanism.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 864c9debe8..60fab6a52b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1119,13 +1119,16 @@ pq_discardbytes(size_t len)
 /* 
  *		pq_buffer_has_data		- is any buffered data available to read?
  *
- * This will *not* attempt to read more data.
+ * Actually returns the number of bytes in the buffer...
+ *
+ * This will *not* attempt to read more data. And reading up to that number of
+ * bytes should not cause reading any more data either.
  * 
  */
-bool
+size_t
 pq_buffer_has_data(void)
 {
-	return (PqRecvPointer < PqRecvLength);
+	return (PqRecvLength - PqRecvPointer);
 }

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..b1631e0830 100644
--- a/src/backend/postmaster/postmaster.c
+++

Re: Experiments with Postgres and SSL

2023-01-19 Thread Greg Stark

On Thu, 19 Jan 2023 at 00:45, Andrey Borodin  wrote:

> But..do we have to treat any unknown start sequence of bytes as a TLS
> connection? Or is there some definite subset of possible first bytes
> that clearly indicates that this is a TLS connection or not?

Absolutely not, there's only one MessageType that can initiate a
connection, ClientHello, so the initial byte has to be a specific
value. (0x16)

And probably to implement HTTP/Websocket it would probably only peek
at the first byte and check for things like G(ET) and H(EAD) and so
on, possibly only over SSL but in theory it could be over any
connection if the request comes before the startup packet.

Personally I'm motivated by wanting to implement status and monitoring
data for things like Prometheus and the like. For that it would just
be simple GET queries to recognize. But tunneling pg wire protocol
over websockets sounds cool but not really something I know a lot
about. I note that Neon is doing something similar with a proxy:
https://neon.tech/blog/serverless-driver-for-postgres

--
greg

Experiments with Postgres and SSL

2023-01-18 Thread Greg Stark

I had a conversation a while back with Heikki where he expressed that
it was annoying that we negotiate SSL/TLS the way we do since it
introduces an extra round trip. Aside from the performance
optimization I think accepting standard TLS connections would open the
door to a number of other opportunities that would be worth it on
their own.

So I took a look into what it would take to do and I think it would
actually be quite feasible. The first byte of a standard TLS
connection can't look anything like the first byte of any flavour of
Postgres startup packet because it would be the high order bits of the
length so unless we start having multi-megabyte startup packets

So I put together a POC patch and it's working quite well and didn't
require very much kludgery. Well, it required some but it's really not
bad. I do have a bug I'm still trying to work out and the code isn't
quite in committable form but I can send the POC patch.

Other things it would open the door to in order from least
controversial to most

* Hiding Postgres behind a standard SSL proxy terminating SSL without
implementing the Postgres protocol.

* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").

* Browser-based protocol implementations using websockets for things
like pgadmin or other tools to connect directly to postgres using
Postgres wire protocol but using native SSL implementations.

* Postgres could even implement an HTTP based version of its protocol
and enable things like queries or browser based tools using straight
up HTTP requests so they don't need to use websockets.

* Postgres could implement other protocols to serve up data like
status queries or monitoring metrics, using HTTP based standard
protocols instead of using our own protocol.

Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.

--
greg

Re: Temporary tables versus wraparound... again

2022-12-14 Thread Greg Stark

> You do have to lock a table in order to update its pg_class row,
> though, whether the table is temporary or not. Otherwise, another
> session could drop it while you're doing something with it, after
> which bad things would happen.

I was responding to this from Andres:

> Is that actually true? Don't we skip some locking operations for temporary
> tables, which then also means catalog modifications cannot safely be done in
> other sessions?

I don't actually see this in the code but in any case we're not doing
any catalog modifications here. We're just inspecting values of
relfrozenxid and relminmxid in the struct returned from
SearchSysCache. Which I think is no different for temp tables than any
other table.

Re: Temporary tables versus wraparound... again

2022-12-14 Thread Greg Stark

On Sat, 5 Nov 2022 at 15:34, Tom Lane  wrote:
>
> Greg Stark  writes:
> > Simple Rebase
>
> I took a little bit of a look through these.
>
> * I find 0001 a bit scary, specifically that it's decided it's
> okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext,
> and especially relation_needs_vacanalyze to another session's
> temp table.  How safe is that really?

So I don't see any evidence we skip any locking on pg_class when doing
updates on rows for temporary tables. It's a bit hard to tell because
there are several ways of checking if a table is temporary. Most
places just check relpersistence but there is also an accessing macro
RelationIsPermanent() as well as a relcache field rd_islocaltemp which
could be used. I'm still looking But so far nearly all the checks I've
found just throw errors for temporary tables and none relate to any
operations on pg_class entries.

In any case we're already using the pg_class struct to look at
relpersistence itself So... the danger to check for is something
we would already be at risk of. Namely that the pg_class row is
updated without any locking and then vacuumed away while we hold this
struct pointer and we're looking at fields that have since been
overwritten with other data from an unrelated row. But that would
require all kinds of rules to be broken and would be posing a risk for
anyone just running select * from pg_class. So I find it hard to
believe we would be doing this.

extract_autovac_opts looks at a variable sized field so concurrent
updates would be an issue, but obviously there are only mvcc updates
to this field so I don't see how it could be a problem.

pgstat_fetch_stat_tabentry I don't even see what the possible risks
would be. The words persistence and temporary don't appear in pgstat.c
(except "temporary statistics" in some log messages).

And then there's relation_needs_vacanalyze() and it looks at
relfrozenxid and relminmxid (and relname in some debug messages).
Those fields could get updated by a concurrent vacuum or -- after this
patch -- a truncate in an inplace_update. That seems to be the only
real risk here.

But this is not related to temporary tables at all. Any pg_class entry
can get in_place_update'd by plain old vacuum to update the
relfrozenxid and relminmxid. The in_place_update would take an
exclusive lock on the buffer but I think that doesn't actually protect
us since autovacuum would only have a pin? Or does the SysCache
protect us by copying out the whole row while it's locked? This is
worth answering but it's not an issue related to this patch or
temporary tables. Is autovacuum looking at relfrozenxid and relminmxid
in a way that's safely protected against a concurrent manual vacuum
issuing an in_place_update?

-- 
greg

Re: Temporary tables versus wraparound... again

2022-12-13 Thread Greg Stark

On Wed, 7 Dec 2022 at 22:02, Greg Stark  wrote:
> > Seems like this should just be in
> > heapam_relation_nontransactional_truncate()?

So here I've done it that way. It is a bit of an unfortunate layering
since it means the heapam_handler is doing the catalog change but it
does seem inconvenient to pass relfrozenxid etc back up and have the
caller make the changes when there are no other changes to make.

Also, I'm not sure what changed but maybe there was some other commits
in vacuum.c in the meantime. I remember being frustrated previously
trying to reuse that code but now it works fine. So I was able to
reduce the copy-pasted code significantly.

(The tests are probably not worth committing, they're just here for my
own testing to be sure it's doing anything)

-- 
greg
From f6f800819c497817c3f2957683fc521abe82c2b7 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 31 Mar 2022 15:49:19 -0400
Subject: [PATCH v8 2/3] Update relfrozenxmin when truncating temp tables

Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats
to the same values used in initial table creation. Otherwise even
typical short-lived transactions in long-lived sessions using
temporary tables can easily cause them to reach transaction wraparound
and autovacuum cannot come to the rescue for temporary tables.

Also optimize the relminmxid used for for temporary tables to be our
own oldest MultiXactId instead of the globally oldest one. This avoids
the expensive calculation of the latter on every transaction commit.

This code path is also used by truncation of tables created within the
same transaction.
---
 src/backend/access/heap/heapam_handler.c | 51 +++-
 src/backend/access/transam/multixact.c   | 15 +++
 src/backend/catalog/heap.c   | 14 ++-
 src/include/access/multixact.h   |  1 +
 4 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ab1bcf3522..b01a25884f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
 #include "commands/progress.h"
+#include "commands/vacuum.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -587,9 +588,18 @@ heapam_relation_set_new_filelocator(Relation rel,
 	 * could reuse values from their local cache, so we are careful to
 	 * consider all currently running multis.
 	 *
-	 * XXX this could be refined further, but is it worth the hassle?
+	 * In the case of temporary tables we can refine this slightly and use a
+	 * our own oldest visible MultiXactId. This is also cheaper to calculate
+	 * which is nice since temporary tables might be getting created often.
 	 */
-	*minmulti = GetOldestMultiXactId();
+	if (persistence == RELPERSISTENCE_TEMP)
+	{
+		*minmulti = GetOurOldestMultiXactId();
+	}
+	else
+	{
+		*minmulti = GetOldestMultiXactId();
+	}
 
 	srel = RelationCreateStorage(*newrlocator, persistence, true);
 
@@ -618,6 +628,43 @@ heapam_relation_set_new_filelocator(Relation rel,
 static void
 heapam_relation_nontransactional_truncate(Relation rel)
 {
+	/* This function from vacuum.c does a non-transactional update of the
+	 * catalog entry for this relation. That's ok because these values are
+	 * always safe regardless of whether we commit and in any case this is
+	 * either a temporary table or a filenode created in this transaction so
+	 * this tuple will be irrelevant if we do not commit. It's also important
+	 * to avoid lots of catalog bloat due to temporary tables being truncated
+	 * on every transaction commit.
+	 *
+	 * We set in_outer_transaction=false because that controls whether
+	 * vac_update_relstats updates other columns like relhasindex,
+	 * relhasrules, relhastriggers which is not changing here. This is a bit
+	 * of a hack, perhaps this parameter should change name.
+	 *
+	 * These parameters should stay in sync with
+	 * heapam_relation_set_new_filelocator() above and AddNewRelationTuple()
+	 * in heap.c. In theory this should probably return the relfrozenxid and
+	 * relminmxid and heap_truncate_one_rel() in heap.c should handle
+	 * num_tuples and num_pages but that would be slightly inconvenient and
+	 * require an api change.
+	 */
+
+	/* Ensure RecentXmin is valid -- it almost certainly is but regression
+	 * tests turned up an unlikely corner case where it might not be */
+	if (!FirstSnapshotSet)
+		(void)GetLatestSnapshot();
+	Assert(FirstSnapshotSet);
+	vac_update_relstats(rel,
+		0, /* num_pages */
+		-1, /* num_tuples */
+		0, /* num_all_visible_pages */
+		true, /* hasindex -- ignored due to in_outer_xact false */
+		RecentXmin, /* frozenxid */
+		RelationIsPermanent(rel) ? GetOldestMultiXactId() : GetOurOldestMultiXactId(),
+		NULL

Re: Temporary tables versus wraparound... again

2022-12-07 Thread Greg Stark

On Thu, 1 Dec 2022 at 14:18, Andres Freund  wrote:
>
> I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums
> etc go through tableam but you put a ResetVacStats() besides each call to
> table_relation_nontransactional_truncate().  Seems like this should just be in
> heapam_relation_nontransactional_truncate()?

So this seems a bit weird. The only other part of the tableam that
touches freezexid and minmxid is table_relation_set_new_filelocator()
which returns them via references so that the callers
(heap.c:heap_create() and relcache.c:RelationSetNewRelfilenumber())
can set them themselves.

I can't really duplicate that layering without changing the API of
heapam_relation_nontransactional_truncate(). Which if it changes would
be quite annoying I think for external pluggable tableams.

But you're right that where I've put it will update relfrozenxid and
minmxid even for relations that have a tableam handler that returns
InvalidXid and doesn't need it. That does seem inappropriate.

I could put it directly in the heapam_handler but is that a layering
issue to be doing a catalog update from within the tableam_handler?
There are no other catalog updates in there.

On the other hand the existing callers of
*_nontransactional_truncate() don't have any existing catalog updates
they want to make so perhaps returning the xid values by reference was
just for convenience to avoid doing an extra update and isn't needed
here.

-- 
greg

Re: Temporary tables versus wraparound... again

2022-12-06 Thread Greg Stark

On Tue, 6 Dec 2022 at 13:59, Andres Freund  wrote:
>
> Hi,
>
> On 2022-12-06 13:47:39 -0500, Greg Stark wrote:
> > So I talked about this patch with Ronan Dunklau and he had a good
> > question Why are we maintaining relfrozenxid and relminmxid in
> > pg_class for temporary tables at all? Autovacuum can't use them and
> > other sessions won't care about them. The only session that might care
> > about them is the one attached to the temp schema.
>
> Uh, without relfrozenxid for temp tables we can end up truncating clog
> "ranges" away that are required to access the temp tables. So this would
> basically mean that temp tables can't be used reliably anymore.

True, we would have to have some other mechanism for exporting the
frozenxid that the session needs. Presumably that would be something
in PGProc like the xmin and other numbers. It could be updated by
scanning our local hash table whenever a transaction starts. This also
probably is what would be needed for multixacts I guess?

-- 
greg

Re: Temporary tables versus wraparound... again

2022-12-06 Thread Greg Stark

So I talked about this patch with Ronan Dunklau and he had a good
question Why are we maintaining relfrozenxid and relminmxid in
pg_class for temporary tables at all? Autovacuum can't use them and
other sessions won't care about them. The only session that might care
about them is the one attached to the temp schema.

So we could replace relfrozenxid and relminmxid for temp tables with a
local hash table that can be updated on every truncate easily and
efficiently.

If a temp table actually wraps around the only session that runs into
problems is the one attached to that temp schema. It can throw local
session errors and doesn't need to interfere with the rest of the
cluster in any way. It could even start running vacuums though I'm not
convinced that's a great solution.

At least I think so. I'm pretty sure about relfrozenxid but as always
I don't really know how relminmxid works. I think we only need to
worry about multixacts for subtransactions, all of which are in the
same transaction  -- does that even work that way?

But this is really attractive since it means no catalog updates just
for temp tables on every transaction and no wraparound cluster
problems even if you have on-commit-preserve-rows tables. It really
shouldn't be possible for a regular user to cause the whole cluster to
run into problems just by creating a temp table and keeping a
connection around a while.

Re: Temporary tables versus wraparound... again

2022-12-02 Thread Greg Stark

On Thu, 1 Dec 2022 at 14:18, Andres Freund  wrote:
>
> Hi,
>
> On 2022-12-01 11:13:01 -0500, Greg Stark wrote:
> > On Sat, 5 Nov 2022 at 11:34, Tom Lane  wrote:
> >
> > > * I find 0001 a bit scary, specifically that it's decided it's
> > > okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext,
> > > and especially relation_needs_vacanalyze to another session's
> > > temp table.  How safe is that really?
> >
> > I  can look a bit more closely but none of them are doing any thing
> > with the table itself, just the catalog entries which afaik have
> > always been fair game for other sessions. So I'm not really clear what
> > kind of unsafeness you're asking about.
>
> Is that actually true? Don't we skip some locking operations for temporary
> tables, which then also means catalog modifications cannot safely be done in
> other sessions?

This code isn't doing any catalog modifications from other sessions.
The code Tom's referring to is just autovacuum looking at relfrozenxid
and relfrozenmxid and printing warnings if they're approaching the
wraparound limits that would otherwise trigger an anti-wraparound
vacuum.

> I find it problematic that ResetVacStats() bypasses tableam. Normal vacuums
> etc go through tableam but you put a ResetVacStats() besides each call to
> table_relation_nontransactional_truncate().  Seems like this should just be in
> heapam_relation_nontransactional_truncate()?

Ok. Think this patch actually predated the tableam (by a lot. I've had
others actually approach me about whether there's a good solution
because it's been biting them too) and I didn't see that in the merge
forward.

> Is it a good idea to use heap_inplace_update() in ResetVacStats()?

This is a good question. I had the impression it was actually the
right thing to do and there's actually been bugs in the past caused by
*not* using heap_inplace_update() so I think it's actually important
to get this right.

I don't see any documentation explaining what the rules are for when
inplace edits are safe or unsafe or indeed when they're necessary for
correctness. So I searched back through the archives and checked when
it came up.

It seems there are a few issues:

a) Nontransactional operations like vacuum have to use it because they
don't have a transaction. Presumably this is why vacuum normally uses
inplace_update for these stats.

b) in the past SnapshotNow scans would behave incorrectly if we do
normal mvcc updates on rows without exclusive locks protecting against
concurrent scans. I'm not sure if this is still a factor these days
with the types of scans that still exist.

c) There are some constraints having to do with logical replication
that I didn't understand. I hope they don't relate to frozenxid but I
don't know

d) There were also some issues having to do with SInval messages but I
think they were additional constraints that inplace updates needed to
be concerned about.

These truncates are done at end of transaction but precommit so the
transaction is still alive and there obviously should be no concurrent
scans on temporary tables so I think it should be safe to do a regular
mvcc update. Is it a good idea to bloat the catalog though? If you
have many temporary tables and don't actually touch more than a few of
them in your transaction that could be a lot of new tuple inserts on
every commit.

Actually it's only sort of true -- if no persistent xid is created
then we would be creating one just for this. But that shouldn't happen
because we only truncate if the transaction ever "touched" a temporary
table. It occurs to me it could still be kind of a problem if you have
a temporary table that you use once and then your session stays alive
for a long time without using temporary tables. Then it won't be
truncated and the frozenxid won't be advanced :(

It's kind of annoying that we have to put RecentXmin and
Get{Our,}OldestMultiXactId() in the table when truncating and then
keep advancing them even if there's no data in the table. Ideally
wouldn't it be better to be able to have Invalid{Sub,}Xid there and
only initialize it when a first insert is made?

-- 
greg

Re: Temporary tables versus wraparound... again

2022-12-01 Thread Greg Stark

On Sat, 5 Nov 2022 at 11:34, Tom Lane  wrote:
>
> Greg Stark  writes:
> > Simple Rebase
>
> I took a little bit of a look through these.
>
> * I find 0001 a bit scary, specifically that it's decided it's
> okay to apply extract_autovac_opts, pgstat_fetch_stat_tabentry_ext,
> and especially relation_needs_vacanalyze to another session's
> temp table.  How safe is that really?

I  can look a bit more closely but none of them are doing any thing
with the table itself, just the catalog entries which afaik have
always been fair game for other sessions. So I'm not really clear what
kind of unsafeness you're asking about.

> * Don't see much point in renaming checkTempNamespaceStatus.
> That doesn't make it not an ABI break.  If we want to back-patch
> this we'll have to do something different than what you did here.

Well it's an ABI break but at least it's an ABI break that gives a
build-time error or shared library loading error rather than one that
just crashes or writes to random memory at runtime.

> * In 0002, I don't especially approve of what you've done with
> the relminmxid calculation --- that seems to move it out of
> "pure bug fix" territory and into "hmm, I wonder if this
> creates new bugs" territory.

Hm. Ok, I can separate that into a separate patch. I admit I have a
lot of trouble remembering how multixactids work.

>  Also, skipping that update
> for non-temp tables immediately falsifies ResetVacStats'
> claimed charter of "resetting to the same values used when
> creating tables".  Surely GetOldestMultiXactId isn't *that*
> expensive, especially compared to the costs of file truncation.
> I think you should just do GetOldestMultiXactId straight up,
> and maybe submit a separate performance-improvement patch
> to make it do the other thing (in both places) for temp tables.

Hm. the feedback I got earlier was that it was quite expensive. That
said, I think the concern was about the temp tables where the truncate
was happening on every transaction commit when they're used. For
regular truncates I'm not sure how important optimizing it is.

> * I wonder if this is the best place for ResetVacStats --- it
> doesn't seem to be very close to the code it needs to stay in
> sync with.  If there's no better place, perhaps adding cross-
> reference comments in both directions would be advisable.

I'll look at that. I think relfrozenxid and relfrozenmxid are touched
in a lot of places so it may be tilting at windmills to try to
centralize the code working with them at this point...

> * 0003 says it's running temp.sql by itself to avoid interference
> from other sessions, but sadly that cannot avoid interference
> from background autovacuum/autoanalyze.  I seriously doubt this
> patch would survive contact with the buildfarm.  Do we actually
> need a new test case?  It's not like the code won't get exercised
> without it --- we have plenty of temp table truncations, surely.

No I don't think we do. I kept it in a separate commit so it could be
dropped when committing.

But just having truncate working isn't really good enough either. An
early version of the patch had a bug that meant it didn't run at all
so truncate worked fine but relfrozenxid never got reset.

In thinking about whether we could have a basic test that temp tables
are getting reset at all it occurs to me that there's still a gap
here:

You can have a session attached to a temp namespace that never
actually uses the temp tables. That would prevent autovacuum from
dropping them and still never reset their vacuum stats. :( Offhand I
think PreCommit_on_commit_actions() could occasionally truncate all ON
COMMIT TRUNCATE tables even if they haven't been touched in this
transaction.

--
greg

Re: Patch: Global Unique Index

2022-11-30 Thread Greg Stark

On Tue, 29 Nov 2022 at 21:16, Tom Lane  wrote:
>
> I actually think that that problem should be soluble with a
> slightly different approach.  The thing that feels insoluble
> is that you can't do this without acquiring sufficient locks
> to prevent addition of new partitions while the insertion is
> in progress.  That will be expensive in itself, and it will
> turn ATTACH PARTITION into a performance disaster.

I think there`s a lot of room to manoeuvre here. This is a new feature
that doesn't need to be 100% complete or satisfy any existing
standard. There are lots of options for compromises that leave room
for future improvements.

1) We could just say sure ATTACH is slow if you're attaching an
non-empty partition
2) We could invent a concept like convalidated and let people attach a
partition without validating the uniqueness and then validate it later
concurrently
3) We could say ATTACH doesn't work now and come up with a better
strategy in the future

Also, don't I vaguely recall something in exclusion constraints about
having some kind of in-memory "intent" list where you declared that
you're about to insert a value, you validate it doesn't violate the
constraint and then you're free to insert it because anyone else will
see your intent in memory? There might be a need for some kind of
global object that only holds inserted keys long enough that other
sessions are guaranteed to see the key in the correct index. And that
could maybe even be in memory rather than on disk.

This isn't a simple project but I don't think it's impossible as long
as we keep an open mind about the requirements.

--
greg

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-29 Thread Greg Stark

On Mon, 28 Nov 2022 at 16:53, Peter Geoghegan  wrote:
>

> Imagine if we actually had 64-bit XIDs -- let's assume for a moment
> that it's a done deal. This raises a somewhat awkward question: do you
> just let the system get further and further behind on freezing,
> forever? We can all agree that 2 billion XIDs is very likely the wrong
> time to start refusing new XIDs -- because it just isn't designed with
> debt in mind. But what's the right time, if any? How much debt is too
> much?

My first thought was... why not? Just let the system get further and
further behind on freezing. Where's the harm?

Picture an insert-only database that is receiving data very quickly
never having data deleted or modified. vacuum takes several days to
complete and the system wraps 32-bit xid several times a day.

The DBA asks you why are they even bothering running vacuum? They have
plenty of storage for clog, latency on selects is not a pain point,
not compared to running multi-day vacuums that impact insert times

That isn't far off the scenario where I've seen wraparound being a
pain btw. Anti-wraparound vacuum took about 2 days and was kicking off
pretty much as soon as the previous one finished. For a table that was
mostly read-only.

Of course to make the judgement the DBA needs to have good ways to
measure the space usage of clog, and the overhead caused by clog
lookups that could be avoided. Then they can judge for themselves how
much freezing is appropriate.

-- 
greg

Re: Patch: Global Unique Index

2022-11-29 Thread Greg Stark

On Fri, 25 Nov 2022 at 20:03, David Zhang  wrote:
>
> Hi Bruce,
>
> Thank you for helping review the patches in such detail.
>
> On 2022-11-25 9:48 a.m., Bruce Momjian wrote:
>
> Looking at the patch, I am unclear how the the patch prevents concurrent
> duplicate value insertion during the partitioned index checking.  I am
> actually not sure how that can be done without locking all indexes or
> inserting placeholder entries in all indexes.  (Yeah, that sounds bad,
> unless I am missing something.)
>
> For the uniqueness check cross all partitions, we tried to follow the 
> implementation of uniqueness check on a single partition, and added a loop to 
> check uniqueness on other partitions after the index tuple has been inserted 
> to current index partition but before this index tuple has been made visible. 
> The uniqueness check will wait `XactLockTableWait` if there is a valid 
> transaction in process, and performs the uniqueness check again after the 
> in-process transaction finished.

I think this is the key issue to discuss. The rest is all UX
bikeshedding (which is pretty important in this case) but this is the
core uniqueness implementation.

If I understand correctly you're going to insert into the local index
for the partition using the normal btree uniqueness implementation.
Then while holding an exclusive lock on the index do lookups on every
partition for the new key. Effectively serializing inserts to the
table?

I think the precedent here are "exclusion constraints" which are
documented in two places in the manual:
https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION
https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-EXCLUDE

These also work by doing lookups for violating entries and don't
depend on any special index machinery like btree uniqueness. But I
don't think they need to entirely serialize inserts either so it may
be worth trying to figure out how they manage this to avoid imposing
that overhead.

There's a comment in src/backend/executor/execIndexing.c near the top
about them but I'm not sure it covers all the magic needed for them to
work...

-- 
greg

Re: Introduce a new view for checkpointer related stats

2022-11-29 Thread Greg Stark

On Mon, 28 Nov 2022 at 13:00, Robert Haas  wrote:
>
> On Tue, Nov 22, 2022 at 3:53 PM Andres Freund  wrote:

> I vote to just remove them. I think that most people won't update
> their queries until they are forced to do so.  I don't think it
> matters very much when we force them to do that.

I would tend to agree.

If we wanted to have a deprecation period I think the smooth way to do
it would be to introduce two new functions/views with the new split.
Then mark the entire old view as deprecated. That way there isn't a
mix of new and old columns in the same view/function.

I don't think it's really necessary to do that here but there'll
probably be instances where it would be worth doing.

-- 
greg

Re: postgres_fdw binary protocol support

2022-11-23 Thread Greg Stark

On Tue, 22 Nov 2022 at 08:17, Ashutosh Bapat
 wrote:
>
> AFAIU, binary compatibility of two postgresql servers depends upon the
> binary compatibility of the platforms on which they run.

No, libpq binary mode is not architecture-specific. I think you're
thinking of on-disk binary compatibility. But libpq binary mode is
just a binary network representation of the data instead of an ascii
representation. It should be faster and more efficient but it still
goes through binary input/output functions (which aren't named
input/output)

I actually wonder if having this would be a good way to get some code
coverage of the binary input/output functions which I suspect is sadly
lacking now. It wouldn't necessarily test that they're doing what
they're supposed to... but at least they would be getting run which I
don't think they are currently?

-- 
greg

Re: pgsql: Prevent instability in contrib/pageinspect's regression test.

2022-11-23 Thread Greg Stark

On Mon, 21 Nov 2022 at 15:01, Andres Freund  wrote:
>
> It's somewhat sad to add this restriction - I've used get_raw_page() (+
> other functions) to scan a whole database for a bug. IIRC that actually
> did end up using parallelism, albeit likely not very efficiently.
>
> Don't really have a better idea though.

Given how specific the use case is here a simple solution would be to
just have a dedicated get_raw_temp_page() and restrict get_raw_page()
to persistent tables.

I suppose slightly gilding it would be to make a get_raw_page_temp()
and get_raw_page_persistent() and then you could have get_raw_page()
call the appropropriate one. They would be parallel restricted except
for get_raw_page_persistent() and if you explicitly called it you
could get parallel scans otherwise you wouldn't.

-- 
greg

Re: [PoC] configurable out of disk space elog level

2022-11-22 Thread Greg Stark

On Thu, 17 Nov 2022 at 14:56, Robert Haas  wrote:
>
> Having a switch for one particular kind of error (out of many that
> could possibly occur) that triggers one particular coping strategy
> (out of many that could possibly be used) seems far too specific a
> thing to add as a core feature. And even if we had something much more
> general, I'm not sure why that should go into the database rather than
> being implemented outside it. After all, nothing at all prevents the
> user from scanning the database logs for "out of space" errors and
> shutting down the database if any are found. While you're at it, you
> could make your monitoring script also check the free space on the
> relevant partition using statfs() and page somebody if the utilization
> goes above 95% or whatever threshold you like, which would probably
> avoid service outages much more effectively than $SUBJECT.

I have often thought we report a lot of errors to the user as
transaction errors that database admins are often going to feel they
would rather treat as system-wide errors. Often the error the user
sees seems like a very low level error with no context that they can't
do anythign about. This seems to be an example of that.

I don't really have a good solution for it but I do think most users
would rather deal with these errors at a higher level than individual
queries from individual clients. Out of disk space, hardware errors,
out of memory, etc they would rather handle in one centralized place
as a global condition. You can work around that with
middleware/libraries/monitoring but it's kind of working around the
database giving you the information at the wrong time and place for
your needs.

Re: How to really quit psql?

2022-11-19 Thread Greg Stark

On Sat, 19 Nov 2022 at 14:10, Tom Lane  wrote:
> Under what circumstances would it be appropriate for a script to take
> it on itself to decide that?  It has no way of knowing what the next -f
> option is or what the user intended.

Presumably when they're written by the same person so the script does
effectively know what the "user" intended because it's written by the
same user.

Off the top of my head I could imagine someone writing something like
report-error-and-exit.sql and wanting to be able to use \i
report-error-and-exit.sql to ensure all scripts report their errors
using some common log file or something.

Not saying that's the only or best way to do that though. And there is
the risk that scripts would start using this functionality
inappropriately which would mean, for example, getting an install
script for something and then not being able to use it within another
script safely :(

--
greg

Re: Allow single table VACUUM in transaction block

2022-11-16 Thread Greg Stark

I think the idea of being able to request an autovacuum worker for a
specific table is actually very good. I think it's what most users
actually want when they are running vacuum. In fact in previous jobs
people have built infrastructure that basically duplicates autovacuum
just so they could do this.

However I'm not a fan of commands that sometimes do one thing and
sometimes magically do something very different. I don't like the idea
that the same vacuum command would sometimes run in-process and
sometimes do this out of process request. And the rules for when it
does which are fairly complex to explain -- it runs in process unless
you're in a transaction when it runs out of process unless it's a
temporary table ...

I think this requesting autovacuum worker should be a distinct
command. Or at least an explicit option to vacuum.

Also, I was confused reading the thread above about mention of VACUUM
FULL. I don't understand why it's relevant at all. We certainly can't
force VACUUM FULL when it wasn't requested on potentially large
tables.

Re: PATCH: Using BRIN indexes for sorted output

2022-11-16 Thread Greg Stark

Fwiw tuplesort does do something like what you want for the top-k
case. At least it used to last I looked -- not sure if it went out
with the tapesort ...

For top-k it inserts new tuples into the heap data structure and then
pops the top element out of the hash. That keeps a fixed number of
elements in the heap. It's always inserting and removing at the same
time. I don't think it would be very hard to add a tuplesort interface
to access that behaviour.

For something like BRIN you would sort the ranges by minvalue then
insert all the tuples for each range. Before inserting tuples for a
new range you would first pop out all the tuples that are < the
minvalue for the new range.

I'm not sure how you handle degenerate BRIN indexes that behave
terribly. Like, if many BRIN ranges covered the entire key range.
Perhaps there would be a clever way to spill the overflow and switch
to quicksort for the spilled tuples without wasting lots of work
already done and without being too inefficient.

Re: About displaying NestLoopParam

2022-11-16 Thread Greg Stark

So I guess I don't have much to add since I don't really understand
the Param infrastructure, certainly not any better than you seem to.

I do note that the code in question was added in this commit in 2010.
That predates the addition of LATERAL in 2013. I suppose those
comments may be talking about InitPlans for things like constant
subqueries that have been pulled up to InitPlans in queries like:

explain verbose select * from x join y on (x.i=y.j) where y.j+1=(select 5) ;

Which your patch doesn't eliminate the $0 in. I don't know if the code
you're removing is just for efficiency -- to avoid trawling through
nodes of the plan that can't be relevant -- or for correctness.

Fwiw your patch applied for me and built without warnings and seems to
work for all the queries I've thrown at it so far. That's hardly an
exhaustive test of course.

commit 1cc29fe7c60ba643c114979dbe588d3a38005449
Author: Tom Lane 
Date:   Tue Jul 13 20:57:19 2010 +

Teach EXPLAIN to print PARAM_EXEC Params as the referenced expressions,
rather than just $N.  This brings the display of nestloop-inner-indexscan
plans back to where it's been, and incidentally improves the display of
SubPlan parameters as well.  In passing, simplify the EXPLAIN code by
having it deal primarily in the PlanState tree rather than separately
searching Plan and PlanState trees.  This is noticeably cleaner for
subplans, and about a wash elsewhere.

One small difference from previous behavior is that EXPLAIN will no longer
qualify local variable references in inner-indexscan plan nodes, since it
no longer sees such nodes as possibly referencing multiple tables.  Vars
referenced through PARAM_EXEC Params are still forcibly qualified, though,
so I don't think the display is any more confusing than before.  Adjust a
couple of examples in the documentation to match this behavior.

On Tue, 20 Sept 2022 at 05:00, Richard Guo  wrote:
>
>
> On Fri, Sep 16, 2022 at 5:59 PM Richard Guo  wrote:
>>
>> I'm not saying this is a bug, but just curious why param 0 cannot be
>> displayed as the referenced expression. And I find the reason is that in
>> function find_param_referent(), we have the 'in_same_plan_level' flag
>> controlling that if we have emerged from a subplan, i.e. not the same
>> plan level any more, we would not look further for the matching
>> NestLoopParam. Param 0 suits this situation.
>>
>> And there is a comment there also saying,
>>
>> /*
>>  * NestLoops transmit params to their inner child only; also, once
>>  * we've crawled up out of a subplan, this couldn't possibly be
>>  * the right match.
>>  */
>
>
> After thinking of this for more time, I still don't see the reason why
> we cannot display NestLoopParam after we've emerged from a subplan.
>
> It seems these params are from parameterized subqueryscan and their
> values are supplied by an upper nestloop. These params should have been
> processed in process_subquery_nestloop_params() that we just add the
> PlannerParamItem entries to root->curOuterParams, in the form of
> NestLoopParam, using the same PARAM_EXEC slots.
>
> So I propose the patch attached to remove the 'in_same_plan_level' flag
> so that we can display NestLoopParam across subplan. Please correct me
> if I'm wrong.
>
> Thanks
> Richard

-- 
greg

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

2022-11-15 Thread Greg Stark

So I think it's kind of cute that you've implemented these as agnostic
wrappers that work with any allocator ... but why?

I would have expected the functionality to just be added directly to
the allocator to explicitly request whole aligned pages which IIRC
it's already capable of doing but just doesn't have any way to
explicitly request.

DirectIO doesn't really need a wide variety of allocation sizes or
alignments, it's always going to be the physical block size which
apparently can be as low as 512 bytes but I'm guessing we're always
going to be using 4kB alignment and multiples of 8kB allocations.
Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB
alignment be simpler and more efficient than working around misaligned
pointers and having all these branches and arithmetic happening?

Re: Temporary tables versus wraparound... again

2022-11-02 Thread Greg Stark

On Sun, 8 Nov 2020 at 18:19, Greg Stark  wrote:
>
> We had an outage caused by transaction wraparound. And yes, one of the
> first things I did on this site was check that we didn't have any
> databases that were in danger of wraparound.

Fwiw this patch has been in "Ready for Committer" state since April
and has been moved forward three times including missing the release.
It's a pretty short patch and fixes a problem that caused an outage
for $previous_employer and I've had private discussions from other
people who have been struggling with the same issue. Personally I
consider it pretty close to a bug fix and worth backpatching. I think
it's pretty annoying to have put out a release without this fix.

-- 
greg

Re: Commit fest 2022-11

2022-11-02 Thread Greg Stark

On Tue, 1 Nov 2022 at 06:56, Michael Paquier  wrote:

> Two people showing up to help is really great, thanks!  I'll be around
> as well this month, so I'll do my share of patches, as usual.

Fwiw I can help as well -- starting next week. I can't do much this week though.

I would suggest starting with the cfbot to mark anything that isn't
applying cleanly and passing tests (and looking for more than design
feedback) as Waiting on Author and reminding the author that it's
commitfest time and a good time to bring the patch into a clean state.

-- 
greg

Re: [Commitfest 2022-09] Date is Over.

2022-10-06 Thread Greg Stark

Fwiw I'm going through some patches looking for patches to review And
I'm finding that the patches I'm seeing actually did get reviews, some of
them months ago.

If there was any substantial feedback since the last patch was posted I
would say you should change the status to Waiting on Author when moving it
forward rather than leaving it as Needs Review.

Ideally there should be very few patches moved to the next commitfest as
Needs Review. Only patches that have been not getting attention and the
author is blocked waiting on feedback.

Re: Pruning never visible changes

2022-09-18 Thread Greg Stark

On Fri, 16 Sept 2022 at 10:27, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > A user asked me whether we prune never visible changes, such as
> > BEGIN;
> > INSERT...
> > UPDATE.. (same row)
> > COMMIT;
>
> Didn't we just have this discussion in another thread?

Well.  not "just" :)

commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
Author: Tom Lane 
Date:   Fri Apr 29 16:29:42 2011 -0400

Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().

VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it.  The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots.  However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain).  This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.

Easiest fix is to get rid of the special case for xmin == xmax.  This may
delay reclaiming dead space for a little bit in some cases, but it's by far
the most reliable way to fix the issue.

Per bug #5998 from Mark Reid.  Back-patch to 8.3, which is the oldest
version with MVCC-safe CLUSTER.

Re: Tracking last scan time

2022-08-23 Thread Greg Stark

On Tue, 23 Aug 2022 at 11:00, Dave Page  wrote:
>
> Often it is beneficial to review one's schema with a view to removing indexes 
> (and sometimes tables) that are no longer required. It's very difficult to 
> understand when that is the case by looking at the number of scans of a 
> relation as, for example, an index may be used infrequently but may be 
> critical in those times when it is used.

I think this is easy to answer in a prometheus/datadog/etc world since
you can consult the history of the count to see when it was last
incremented. (Or do effectively that continously).

I guess that just reinforces the idea that it should be optional.
Perhaps there's room for some sort of general feature for controlling
various time series aggregates like max() and min() sum() or, uhm,
timeoflastchange() on whatever stats you want. That would let us
remove a bunch of stuff from pg_stat_statements and let users turn on
just the ones they want. And also let users enable things like time of
last rollback or conflict etc. But that's just something to think
about down the road.

> The attached patch against HEAD adds optional tracking of the last scan time 
> for relations. It updates pg_stat_*_tables with new last_seq_scan and 
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to 
> help with this.
>
> Due to the use of gettimeofday(), those values are only maintained if a new 
> GUC, track_scans, is set to on. By default, it is off.

Bikeshedding warning -- "track_scans" could equally apply to almost
any stats about scans. I think the really relevant thing here is the
times, not the scans. I think the GUC should be "track_scan_times". Or
could that still be confused with scan durations? Maybe
"track_scan_timestamps"?

You could maybe make the gettimeofday cheaper by doing it less often.
Like, skipping the increment if the old timestamp is newer than 1s
before the transaction start time (I think that's available free if
some other guc is enabled but I don't recall). Or isn't this cb
normally happening after transaction end? So xactStopTimestamp might
be available already?

-- 
greg

Re: identifying the backend that owns a temporary schema

2022-08-23 Thread Greg Stark

Having this function would be great (I admit I never responded because
I never figured out if your suggestion was right or not:). But should
it also be added to the pg_stat_activity view? Perhaps even just in
the SQL view using the function.

Alternately should pg_stat_activity show the actual temp schema name
instead of the id? I don't recall if it's visible outside the backend
but if it is, could pg_stat_activity show whether the temp schema is
actually attached or not?

Re: shared-memory based stats collector - v70

2022-08-18 Thread Greg Stark

On Thu, 18 Aug 2022 at 02:27, Drouvot, Bertrand  wrote:
>
> What I had in mind is to provide an API to retrieve stats for those that
> would need to connect to each database individually otherwise.
>
> That's why I focused on PGSTAT_KIND_RELATION that has
> PgStat_KindInfo.accessed_across_databases set to false.
>
> I think that another candidate could also be PGSTAT_KIND_FUNCTION.

And indexes of course. It's a bit frustrating since without the
catalog you won't know what table the index actually is for... But
they're pretty important stats.

On that note though... What do you think about having the capability
to add other stats kinds to the stats infrastructure? It would make a
lot of sense for pg_stat_statements to add its entries here instead of
having to reimplement a lot of the same magic. And I have in mind an
extension that allows adding other stats and it would be nice to avoid
having to reimplement any of this.

To do that I guess more of the code needs to be moved to be table
driven from the kind structs either with callbacks or with other meta
data. So the kind record could contain tupledesc and the code to
construct the returned tuple so that these functions could return any
custom entry as well as the standard entries.

-- 
greg

Re: shared-memory based stats collector - v70

2022-08-17 Thread Greg Stark

On Tue, 16 Aug 2022 at 08:49, Drouvot, Bertrand  wrote:
>
>
> +   if (p->key.kind != PGSTAT_KIND_RELATION)
> +   continue;

Hm. So presumably this needs to be extended. Either to let the caller
decide which types of stats to return or to somehow return all the
stats intermixed. In my monitoring code I did the latter because I
didn't think going through the hash table repeatedly would be very
efficient. But it's definitely a pretty awkward API since I need a
switch statement that explicitly lists each case and casts the result.

> > 2) When I did the function attached above I tried to avoid returning
> > the whole set and make it possible to process them as they arrive.
>
> Is it the way it has been done? (did not look at your function yet)

I did it with callbacks. It was quick and easy and convenient for my
use case. But in general I don't really like callbacks and would think
some kind of iterator style api would be nicer.

I am handling the stats entries as they turn up. I'm constructing the
text output for each in a callback and buffering up the whole http
response in a string buffer.

I think that's ok but if I wanted to avoid buffering it up and do
network i/o then I would think the thing to do would be to build the
list of entry keys and then loop over that list doing a hash lookup
for each one and generating the response for each out and writing it
to the network. That way there wouldn't be anything locked, not even
the hash table, while doing network i/o. It would mean a lot of
traffic on the hash table though.

> > -- on that note I wonder if I've done something
> > wrong because I noted a few records with InvalidOid where I didn't
> > expect it.
>
> It looks like that InvalidOid for the dbid means that the entry is for a
> shared relation.

Ah yes. I had actually found that but forgotten it.

There's also a database entry with dboid=InvalidOid which is
apparently where background workers with no database attached report
stats.

> I've in mind to add some filtering on the dbid (I think it could be
> useful for monitoring tool with a persistent connection to one database
> but that wants to pull the stats database per database).
>
> I don't think a look up through the local cache will work if the
> entry/key is related to another database the API is launched from.

Isn't there also a local hash table used to find the entries to reduce
traffic on the shared hash table? Even if you don't take a snapshot
does it get entered there? There are definitely still parts of this
I'm working on a pretty vague understanding of :/

-- 
greg

Re: shared-memory based stats collector - v70

2022-08-15 Thread Greg Stark

On Thu, 11 Aug 2022 at 02:11, Drouvot, Bertrand  wrote:
>
> As Andres was not -1 about that idea (as it should be low cost to add
> and maintain) as long as somebody cares enough to write something: then
> I'll give it a try and submit a patch for it.

I agree it would be a useful feature. I think there may be things to
talk about here though.

1) Are you planning to go through the local hash table and
LocalSnapshot and obey the consistency mode? I was thinking a flag
passed to build_snapshot to request global mode might be sufficient
instead of a completely separate function.

2) When I did the function attached above I tried to avoid returning
the whole set and make it possible to process them as they arrive. I
actually was hoping to get to the point where I could start shipping
out network data as they arrive and not even buffer up the response,
but I think I need to be careful about hash table locking then.

3) They key difference here is that we're returning whatever stats are
in the hash table rather than using the catalog to drive a list of id
numbers to look up. I guess the API should make it clear this is what
is being returned -- on that note I wonder if I've done something
wrong because I noted a few records with InvalidOid where I didn't
expect it.

4) I'm currently looping over the hash table returning the records all
intermixed. Some users will probably want to do things like "return
all Relation records for all databases" or "return all Index records
for database id xxx". So some form of filtering may be best or perhaps
a way to retrieve just the keys so they can then be looked up one by
one (through the local cache?).

5) On that note I'm not clear how the local cache will interact with
these cross-database lookups. That should probably be documented...

-- 
greg

Re: shared-memory based stats collector - v70

2022-08-10 Thread Greg Stark

On Wed, 10 Aug 2022 at 04:05, Drouvot, Bertrand  wrote:
>
> Hi,
>
> On 8/9/22 6:00 PM, Greg Stark wrote:
> > On Tue, 9 Aug 2022 at 06:19, Drouvot, Bertrand  wrote:
> >>
> >> What do you think about adding a function in core PG to provide such
> >> functionality? (means being able to retrieve all the stats (+ eventually
> >> add some filtering) without the need to connect to each database).
> > I'm working on it myself too. I'll post a patch for discussion in a bit.
>
> Great! Thank you!

So I was adding the code to pgstat.c because I had thought there were
some data types I needed and/or static functions I needed. However you
and Andres encouraged me to check again now. And indeed I was able,
after fixing a couple things, to make the code work entirely
externally.

This is definitely not polished and there's a couple obvious things
missing. But at the risk of embarrassment I've attached my WIP. Please
be gentle :) I'll post the github link in a bit when I've fixed up
some meta info.

I'm definitely not wedded to the idea of using callbacks, it was just
the most convenient way to get started, especially when I was putting
the main loop in pgstat.c.  Ideally I do want to keep open the
possibility of streaming the results out without buffering the whole
set in memory.

> Out of curiosity, would you be also interested by such a feature for
> previous versions (that will not get the patch in) ?

I always had trouble understanding the existing stats code so I was
hoping the new code would make it easier. It seems to have worked but
it's possible I'm wrong and it was always possible and the problem was
always just me :)


-- 
greg
/*-
 *
 * telemetry.c
 *
 * Most of this code was copied from pg_prewarm.c as a template. 
 *
 *
 *-
 */

#include "postgres.h"

#include 
#include 
#include 

#include "access/relation.h"
#include "access/xact.h"
#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/bgworker.h"
#include "postmaster/interrupt.h"
#include "storage/buf_internals.h"
#include "storage/dsm.h"
#include "storage/ipc.h"
#include "storage/latch.h"
#include "storage/lwlock.h"
#include "storage/proc.h"
#include "storage/procsignal.h"
#include "storage/shmem.h"
#include "storage/smgr.h"
#include "tcop/tcopprot.h"
#include "utils/acl.h"
#include "utils/datetime.h"
#include "utils/guc.h"
#include "utils/memutils.h"
#include "utils/rel.h"
#include "utils/resowner.h"


#include "telemetry.h"
#include "telemetry_pgstat.h"

PG_MODULE_MAGIC;


/* We should already have included what we need to get uint64_t size_t
   fd_set socklen_t and struct sockaddr so don't let microhttpd
   include extra stuff for them */
#define MHD_PLATFORM_H
#include 

/* MHD_USE_EPOLL */
/* MHD_USE_AUTO */
/* MHD_OPTION_CONNECTION_LIMIT */
/* MHD_OPTION_CONNECTION_TIMEOUT */
/* MHD_OPTION_EXTERNAL_LOGGER */



/* Background worker harness */
void		_PG_init(void);

/* Actual internal background worker main entry point */
static void telemetry_start_worker(unsigned short port);

/* GUC variables. */
static int telemetry_port = 9187; /* TCP port to listen on for metrics */
static char *telemetry_listen_addresses; /* TCP listen addresses */
static bool telemetry = true; /* start worker automatically on default port */


static enum MHD_Result
telemetry_handler(void *cls,
		  struct MHD_Connection *connection,
		  const char *url,
		  const char *method,
		  const char *version,
		  const char *upload_data,
		  size_t *upload_data_size,
		  void **con_cls);

/*
 * Module load callback.
 */
void
_PG_init(void)
{
DefineCustomIntVariable("telemetry.port",
			"TCP Port to serve metrics on by default",
			NULL,
			_port,
			9187, 1, 65535,
			PGC_SIGHUP,
			0, /* flags */
			NULL, NULL, NULL /* hooks */
			);


DefineCustomStringVariable("telemetry.listen_addresses",
			   "TCP Listen Addresses to serve metrics on by default",
			   NULL,
			   _listen_addresses,
			   "*",
			   PGC_SIGHUP,
			   GUC_LIST_INPUT, /* flags */
			   NULL, NULL, NULL /* hooks */
			   );

if (!process_shared_preload_libraries_in_progress)
	return;

/* can't define PGC_POSTMASTER variable after startup */
DefineCustomBoolVariable("telemetry.start_server",
			 "Starts the telemetry worker on startup.",
			 NULL,
			 ,
			 true,

Re: shared-memory based stats collector - v70

2022-08-10 Thread Greg Stark

One thing that's bugging me is that the names we use for these stats
are *all* over the place.

The names go through three different stages

pgstat structs  ->  pgstatfunc tupledescs  ->  pg_stat_* views

(Followed by a fourth stage where pg_exporter or whatever names for
the monitoring software)

And for some reason both transitions (plus the exporter) felt the need
to fiddle with the names or values. And not in any sort of even
vaguely consistent way. So there are three (or four) different sets of
names for the same metrics :(

e.g.

* Some of the struct elements have abbreviated words which are
expanded in the tupledesc names or the view columns -- some have long
names which get abbreviated.

* Some struct members have n_ prefixes (presumably to avoid C keywords
or other namespace issues?) and then lose them at one of the other
stages. But then the relation stats do not have n_ prefixes and then
the pg_stat view *adds* n_ prefixes in the SQL view!

* Some columns are added together in the SQL view which seems like
gratuitously hiding information from the user. The pg_stat_*_tables
view actually looks up information from the indexes stats and combines
them to get idx_scan and idx_tup_fetch.

* The pg_stat_bgwriter view returns data from two different fixed
entries, the checkpointer and the bgwriter, is there a reason those
are kept separately but then reported as if they're one thing?


Some of the simpler renaming could be transparently fixed by making
the internal stats match the public facing names. But for many of them
I think the internal names are better. And the cases where the views
aggregate data in a way that loses information are not something I
want to reproduce.

I had intended to use the internal names directly, reasoning that
transparency and consistency are the direction to be headed. But in
some cases I think the current public names are the better choice -- I
certainly don't want to remove n_* prefixes from some names but then
add them to different names! And some of the cases where the data is
combined or modified do seem like they would be missed.

-- 
greg

1 2 3 4 >

1 - 100 of 348 matches

Mail list logo