Re: Possibility to disable `ALTER SYSTEM`

2023-09-08 Thread Álvaro Hernández



On 7/9/23 21:51, Gabriele Bartolini wrote:

Hi everyone,

I would like to propose a patch that allows administrators to disable 
`ALTER SYSTEM` via either a runt-time option to pass to the Postgres 
server process at startup (e.g. `--disable-alter-system=true`, false 
by default) or a new GUC (or even both), without changing the current 
default method of the server.


The main reason is that this would help improve the “security by 
default” posture of Postgres in a Kubernetes/Cloud Native environment 
- and, in general, in any environment on VMs/bare metal behind a 
configuration management system in which changes should only be made 
in a declarative way and versioned like Ansible Tower, to cite one.


Below you find some background information and the longer story behind 
this proposal.


Sticking to the Kubernetes use case, I am primarily speaking on behalf 
of the CloudNativePG open source operator (cloudnative-pg.io 
, of which I am one of the maintainers). 
However, I am sure that this option could benefit any operator for 
Postgres - an operator is the most common and recommended way to run a 
complex application like a PostgreSQL database management system 
inside Kubernetes.


In this case, the state of a PostgreSQL cluster (for example its 
number of replicas, configuration, storage, etc.) is defined in a 
Custom Resource Definition in the form of configuration, typically 
YAML, and the operator works with Kubernetes to ensure that, at any 
moment, the requested Postgres cluster matches the observed one. This 
is a very basic example in CloudNativePG: 
https://cloudnative-pg.io/documentation/current/samples/cluster-example.yaml


As I was mentioning above, in a Cloud Native environment it is 
expected that workloads are secure by default. Without going into much 
detail, many decisions have been made in that direction by operators 
for Postgres, including CloudNativePG. The goal of this proposal is to 
provide a way to ensure that changes to the PostgreSQL configuration 
in a Kubernetes controlled Postgres cluster are allowed only through 
the Kubernetes API.


Basically, if you want to change an option for PostgreSQL, you need to 
change the desired state in the Kubernetes resource, then Kubernetes 
will converge (through the operator). In simple words, it’s like 
empowering the operator to impersonate the PostgreSQL superuser.




    Coming from a similar background to Gabriele's, I support this 
proposal.


    In StackGres (https://stackgres.io) we also allow users to manage 
postgresql.conf's configuration declaratively. We have a CRD (Custom 
Resource Definition) that precisely defines and controls how a 
postgresql.conf configuration looks like (see 
https://stackgres.io/doc/latest/reference/crd/sgpgconfig/). This 
configuration, once created by the user, is strongly validated by 
StackGres (parameters are valid for the given major version, values are 
within the ranges and appropriate types) and then periodically applied 
to the database if there's any drift between that user-declared 
(desired) state and current system status.


    Similarly, we also have some parameters which the user is not 
allowed to change 
(https://gitlab.com/ongresinc/stackgres/-/blob/main/stackgres-k8s/src/operator/src/main/resources/postgresql-blocklist.properties). 
If the user is allowed to use ALTER SYSTEM and modify some of these 
parameters, significant breakage can happen in the cluster, as the 
operator may become "confused" and manual operation may be required, 
breaking many of the user's expectations of stability and how the system 
works and heals automatically.


    Sure, as mentioned elsewhere in the thread, a "malicious" user can 
still use other mechanisms such as COPY or untrusted PLs to still 
overwrite the configuration. But both attempts are obviously conscious 
attempts to break the system (and if so, it's all yours to break it). 
But ALTER SYSTEM may be an *unintended* way to break it, causing a bad 
user's experience. This may be defined more of a way to avoid users 
shooting themselves in the feet, inadvertedly.


    There's apparently some opposition to implementing this. But given 
that there's also interest in having it, I'd like to know what the 
negative effects of implementing such a startup configuration property 
would be, so that advantages can be compared with the disadvantages.


    All that being said, the behavior to prevent ALTER SYSTEM can also 
be easily implemented as an extension. Actually some colleagues wrote 
one with a similar scope 
(https://gitlab.com/ongresinc/extensions/noset), and I believe it could 
be the base for a similar extension focused on preventing ALTER SYSTEM.



    Regards,

    Álvaro

--

Alvaro Hernandez


---
OnGres


Re: CDC feature request

2021-03-18 Thread Álvaro Hernández


On 18/3/21 14:03, Stepan Yankevych wrote:
>
> Hi All!
>
>  
>
> Hopefully I’m using correct mail list  
>
> If not please show me right direction 
>
>  
>
> I’m quite struggling without native Change Data Capture feature in
> PostgreSQL.
>
>  
>
> That would be great to implement it, possibly in not so complicated way.
>
>  
>
> Can Logical replication be a little be modified or reused to do not
> replicate data into destination table as is but to insert each change
> into “change table” (like in oracle 11 CDC)?
>
> That change table should have at lease few additional columns
>
>   * Operation (I/D/U)
>   * txid
>   * Commit_time_stamp
>
>  
>
> Thanks!
>
>  
>
> *Stepan Yankevych*
>

    Hi Stepan.

    I would recommend you to check https://debezium.io/, it stores every
change in Kafka with detailed metadata, and you can later transform
and/or inject it into any destination, with great level of flexibility,
using any of the database connectors available.

   
    Álvaro

-- 

Alvaro Hernandez


---
OnGres



Re: PROXY protocol support

2021-03-05 Thread Álvaro Hernández



On 5/3/21 10:03, Magnus Hagander wrote:
> On Fri, Mar 5, 2021 at 1:33 AM Álvaro Hernández  wrote:
>>
>>
>> On 5/3/21 0:21, Jacob Champion wrote:
>>> On Thu, 2021-03-04 at 21:45 +0100, Magnus Hagander wrote:
>>>> On Thu, Mar 4, 2021 at 9:07 PM Jacob Champion  wrote:
>>>>> Idle thought I had while setting up a local test rig: Are there any
>>>>> compelling cases for allowing PROXY packets to arrive over Unix
>>>>> sockets? (By which I mean, the proxy is running on the same machine as
>>>>> Postgres, and connects to it using the .s.PGSQL socket file instead of
>>>>> TCP.) Are there cases where you want some other software to interact
>>>>> with the TCP stack instead of Postgres, but it'd still be nice to have
>>>>> the original connection information available?
>>>> I'm uncertain what that usecase would be for something like haproxy,
>>>> tbh. It can't do connection pooling, so adding it on the same machine
>>>> as postgres itself wouldn't really add anything, I think?
>>> Yeah, I wasn't thinking HAproxy so much as some unspecified software
>>> appliance that's performing Some Task before allowing a TCP client to
>>> speak to Postgres. But it'd be better to hear from someone that has an
>>> actual use case, instead of me spitballing.
>> Here's a use case: Envoy's Postgres filter (see [1], [2]). Right now
>> is able to capture protocol-level metrics and send them to a metrics
>> collector (eg. Prometheus) while proxying the traffic. More capabilities
>> are being added as of today, and will eventually manage HBA too. It
>> would greatly benefit from this proposal, since it proxies the traffic
>> with, obviously, its IP, not the client's. It may be used (we do)
>> locally fronting Postgres, via UDS (so it can be easily trusted).
> Yeah, Envoy is definitely a great example of a usecase for the proxy
> protocol in general.

    Actually Envoy already implements the Proxy protocol:
https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/proxy_protocol.html
But I believe it would need some further cooperation with the Postgres
filter, unless they can be chained directly. Still, Postgres needs to
understand it, which is what your patch would add (thanks!).

>
> Specifically about the Unix socket though -- doesn't envoy normally
> run on a different instance (or in a different container at least),
> thus normally uses tcp between envoy and postgres? Or would it be a
> reasonable usecase that you ran it locally on the postgres server,
> having it speak IP to the clients but unix sockets to the postgres
> backend? I guess maybe it is outside of the containerized world?
>

    This is exactly the architecture we use at StackGres [1][2]. We use
Envoy as a sidecar (so it runs on the same pod, server as Postgres) and
connects via UDS. But then exposes the connection to the outside clients
via TCP/IP. So in my opinion it is quite applicable to the container
world :)


    Álvaro


[1] https://stackgres.io
[2]
https://stackgres.io/doc/latest/intro/architecture/#stackgres-pod-architecture-diagram

-- 

Alvaro Hernandez


---
OnGres






Re: PROXY protocol support

2021-03-04 Thread Álvaro Hernández



On 5/3/21 0:21, Jacob Champion wrote:
> On Thu, 2021-03-04 at 21:45 +0100, Magnus Hagander wrote:
>> On Thu, Mar 4, 2021 at 9:07 PM Jacob Champion  wrote:
>>> Idle thought I had while setting up a local test rig: Are there any
>>> compelling cases for allowing PROXY packets to arrive over Unix
>>> sockets? (By which I mean, the proxy is running on the same machine as
>>> Postgres, and connects to it using the .s.PGSQL socket file instead of
>>> TCP.) Are there cases where you want some other software to interact
>>> with the TCP stack instead of Postgres, but it'd still be nice to have
>>> the original connection information available?
>> I'm uncertain what that usecase would be for something like haproxy,
>> tbh. It can't do connection pooling, so adding it on the same machine
>> as postgres itself wouldn't really add anything, I think?
> Yeah, I wasn't thinking HAproxy so much as some unspecified software
> appliance that's performing Some Task before allowing a TCP client to
> speak to Postgres. But it'd be better to hear from someone that has an
> actual use case, instead of me spitballing.

    Here's a use case: Envoy's Postgres filter (see [1], [2]). Right now
is able to capture protocol-level metrics and send them to a metrics
collector (eg. Prometheus) while proxying the traffic. More capabilities
are being added as of today, and will eventually manage HBA too. It
would greatly benefit from this proposal, since it proxies the traffic
with, obviously, its IP, not the client's. It may be used (we do)
locally fronting Postgres, via UDS (so it can be easily trusted).


    Álvaro


[1]
https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/network_filters/postgres_proxy_filter
[2]
https://www.cncf.io/blog/2020/08/13/envoy-1-15-introduces-a-new-postgres-extension-with-monitoring-support/

-- 

Alvaro Hernandez


---
OnGres






Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Álvaro Hernández



On 19/2/21 19:30, Jan Wieck wrote:
> [...]
>
> I also am not sure if building a connection pool into a background
> worker or postmaster is a good idea to begin with. One of the
> important features of a pool is to be able to suspend traffic and make
> the server completely idle to for example be able to restart the
> postmaster without forcibly disconnecting all clients. A pool built
> into a background worker cannot do that.
>
>

    In my opinion, there are different reasons to use a connection pool,
that lead to different placements of that connection pool on the
architecture of the system. The ability of a pool to suspend (pause)
traffic and apply live re-configurations is a very important one to
implement high availability practices, transparent scaling, and others.
But these poolers belong to middleware layers (as in different processes
in different servers), where these pausing operations make complete sense.

    Connection poolers fronting the database have other specific
missions, namely to control the fan-in of connections to the database.
These connection poolers make sense being as close to the database as
possible (ideally: embedded) but don't need to perform pause operations
here.


    Álvaro


-- 

Alvaro Hernandez


---
OnGres






Re: Extensibility of the PostgreSQL wire protocol

2021-02-19 Thread Álvaro Hernández



On 19/2/21 14:48, Heikki Linnakangas wrote:
> [...]
>
> I can see value in supporting different protocols. I don't like the
> approach discussed in this thread, however.
>
> For example, there has been discussion elsewhere about integrating
> connection pooling into the server itself. For that, you want to have
> a custom process that listens for incoming connections, and launches
> backends independently of the incoming connections. These hooks would
> not help with that.
>
> Similarly, if you want to integrate a web server into the database
> server, you probably also want some kind of connection pooling. A
> one-to-one relationship between HTTP connections and backend processes
> doesn't seem nice.

    While I'm far from an HTTP/2 expert and I may be wrong, from all I
know HTTP/2 allows to create full-duplex, multiplexed, asynchronous
channels. So multiple connections can be funneled through a single
connection. This decreases the need and aim for connection pooling. It
doesn't eliminate it completely, as you may have the channel busy if a
single tenant is sending a lot of data; and you could not have more than
one concurrent action from a single tenant. OTOH, given these HTTP/2
properties, a non-pooled HTTP/2 endpoint may provide already significant
benefits due to the multiplexing capabilities.

    So definitely we don't need to consider an HTTP endpoint would
entail a 1:1 mapping between connections and backend processes.


    Álvaro

-- 

Alvaro Hernandez


---
OnGres






Re: PostgreSQL <-> Babelfish integration

2021-02-12 Thread Álvaro Hernández



On 12/2/21 19:44, Peter Geoghegan wrote:
> On Fri, Feb 12, 2021 at 10:26 AM Álvaro Hernández  wrote:
>> As I stated in the mentioned post, I believe Babelfish is a very
>> welcomed addition to the PostgreSQL ecosystem. It allows PostgreSQL to
>> reach other users, other use cases, other markets; something which in my
>> opinion PostgreSQL really needs to extend its reach, to become a more
>> relevant player in the database market. The potential is there,
>> specially given all the extensibility points that PostgreSQL already
>> has, which are unparalleled in the industry.
> Let's assume for the sake of argument that your analysis of the
> benefits is 100% correct -- let's take it for granted that Babelfish
> is manna from heaven. It's still not clear that it's worth embracing
> Babelfish in the way that you have advocated.
>
> We simply don't know what the costs are. Because there is no source
> code available. Maybe that will change tomorrow or next week, but as
> of this moment there is simply nothing substantive to evaluate.

    I'm sure if we embrace an open and honest conversation, we will be
able to figure out what the integration costs are even before the source
code gets published. As I said, this goes beyond the very technical
detail of source code integration. Waiting until the source code is
published is a bit chicken-and-egg (as I presume the source will morph
towards convergence if there's work that may be started, even if it is
just for example for protocol extensibility).

    I'm sure this can be also discussed at an architectural level,
getting an analysis of what parts of PostgreSQL would be changed, what
extension mechanisms are required, what is the volume of the project,
and many others.

    Álvaro


-- 

Alvaro Hernandez


---
OnGres





PostgreSQL <-> Babelfish integration

2021-02-12 Thread Álvaro Hernández


    I would like to share my thoughts in the list about the potential
PostgreSQL <-> Babelfish integration. There is already a thread about
protocol hooks [1], but I'd like to offer my PoV from a higher level
perspective and keep that thread for the technical aspects of the
protocol hooks. This is also a follow-up on a public blog post I
recently published [2], and the feedback I received to bring the topic
to the ML.

    As I stated in the mentioned post, I believe Babelfish is a very
welcomed addition to the PostgreSQL ecosystem. It allows PostgreSQL to
reach other users, other use cases, other markets; something which in my
opinion PostgreSQL really needs to extend its reach, to become a more
relevant player in the database market. The potential is there,
specially given all the extensibility points that PostgreSQL already
has, which are unparalleled in the industry.

    I believe we should engage in a conversation, with AWS included,
about how we can possibly benefit from this integration. It must be
symbiotic, both "parties" should win with it, otherwise it won't work.
But I believe it can definitely be a win-win situation. There has been
some concerns that this may be for Amazon's own benefit, and would
suppose an increased maintenance burden for the PostgreSQL Community. I
believe that analysis is not including the many benefits that such a
compatibility for PostgreSQL would bring in many fronts. And possibly,
the changes required to core, are beneficial for other areas of
PostgreSQL. Several have already pointed out in the extensibility hooks
thread that this could allow for new protocols into PostgreSQL,
including the much desired v4 or an HTTP one. I can only strongly second
that, and we should also analyze it from this perspective.

    There is also a risk factor that I believe needs to be factored into
the analysis, and is what is the risk of not doing anything. From my
understanding, it is very clear that AWS wants to treat Babelfish as a
kind of development branch, waiting for inclusion into mainline. But I
also believe, if this branch sits forever not merged, at some point, may
be under the risk of having its own life, becoming a fork. And if it
does, it may become our "MariaDB". I would not like this to happen.

    I'm happy to contribute what I can to this discussion: if we want
Babelfish to be integrated, how, analyze pros and cons, etc. I see this
as an incredible gift that, if managed properly, not only will make
PostgreSQL much better in use-cases that cannot access now; but may also
boost PostgreSQL's extensibility even further, and maybe even spark
development of some projects (like v4 or HTTP protocol) that have been
longer dismissed because there were (logically) too many requisites for
any v3 replacement, that made its replacement extremely hard.

    But of course, these are just the humble 2 cents of a casual
-hackers reader.


    Álvaro


[1]
https://www.postgresql.org/message-id/CAGBW59d5SjLyJLt-jwNv%2BoP6esbD8SCB%3D%3D%3D11WVe5%3DdOHLQ5wQ%40mail.gmail.com
[2] https://postgresql.fund/blog/babelfish-the-elephant-in-the-room/

-- 

Alvaro Hernandez


---
OnGres