Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Merlin Moncure
On Mon, Apr 5, 2021 at 9:37 PM Rob Sargent  wrote:
>
> It's a small thing, but UUIDs are absolutely not memorizable by
> humans; they have zero semantic value.  Sequential numeric identifiers
> are generally easier to transpose and the value gives some clues to
> its age (of course, in security contexts this can be a downside).
>
> I take the above as a definite plus.  Spent too much of my life correcting 
> others’ use of “remembered” id’s that just happened to perfectly match the 
> wrong thing.
>
> Performance-wise, UUIDS are absolutely horrible for data at scale as
> Tom rightly points out.  Everything is randomized, just awful.  There
> are some alternate implementations of UUID that mitigate this but I've
> never seen them used in the wild in actual code.
>
>
> That b-tree’s have been optimized to handle serial ints might be a considered 
> a reaction to that popular (and distasteful) choice.  Perhaps there should be 
> a ’non-optimized’ option.

It's not just the BTree, but the heap as well.   For large tables, you
are pretty much guaranteed to read a page for each record you want to
load via the key regardless of the pattern of access.  It's incredibly
wasteful regardless of the speed of the underlying storage fabric.
Very few developers actually understand this.

If computers were infinitely fast this wouldn't matter, but they aren't :-).

merlin




Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Rob Sargent


> 
> It's a small thing, but UUIDs are absolutely not memorizable by
> humans; they have zero semantic value.  Sequential numeric identifiers
> are generally easier to transpose and the value gives some clues to
> its age (of course, in security contexts this can be a downside).
> 
I take the above as a definite plus.  Spent too much of my life correcting 
others’ use of “remembered” id’s that just happened to perfectly match the 
wrong thing.

> Performance-wise, UUIDS are absolutely horrible for data at scale as
> Tom rightly points out.  Everything is randomized, just awful.  There
> are some alternate implementations of UUID that mitigate this but I've
> never seen them used in the wild in actual code.
> 

That b-tree’s have been optimized to handle serial ints might be a considered a 
reaction to that popular (and distasteful) choice.  Perhaps there should be a 
’non-optimized’ option.




Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Merlin Moncure
On Fri, Apr 2, 2021 at 3:40 AM Laurenz Albe  wrote:
>
> On Thu, 2021-04-01 at 21:28 -0500, Merlin Moncure wrote:
> > I would never use UUIDS for keys though.
>
> That makes me curious for your reasons.
>
> I see the following disadvantages:
>
> - A UUID requires twice as much storage space as a bigint.
>
> - B-tree indexes are space optimized for inserting at the
>   rightmost leaf page, but UUIDs are random.
>
> - UUIDs are more expensive to generate.
>
> On the other hand, many processes trying to insert into
> the same index page might lead to contention.
>
> Is there anything I have missed?

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value.  Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

Performance-wise, UUIDS are absolutely horrible for data at scale as
Tom rightly points out.  Everything is randomized, just awful.  There
are some alternate implementations of UUID that mitigate this but I've
never seen them used in the wild in actual code.

merlin




Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

2021-04-05 Thread Tom Lane
Stephan Knauss  writes:
> On 31.03.2021 20:24, Tom Lane wrote:
>> Based on nearby threads, it occurs to me to ask whether you have JIT
>> enabled, and if so whether turning it off helps.  There seems to be
>> a known leak of the code fragments generated by that in some cases.

> That's it!

> You mentioned that this seems to be known. Do you have pointers to the 
> relevant bug-tracker/thread? I would like to follow up on this.

According to the v14 open issues page [1], there are a couple of older
threads.  I just added this one.

regards, tom lane

[1] https://wiki.postgresql.org/wiki/PostgreSQL_14_Open_Items




Re: Primary keys and composite unique keys(basic question)

2021-04-05 Thread Merlin Moncure
On Thu, Apr 1, 2021 at 10:26 PM Rob Sargent  wrote:
>
> On 4/1/21 8:28 PM, Merlin Moncure wrote:
> >
> > This is one of the great debates in computer science and it is not
> > settled.  There are various tradeoffs around using a composite key
> > derived from the data (aka natural key) vs generated identifiers. It's
> > a complex topic with many facets: performance, organization,
> > validation, and correctness are all relevant considerations.  I would
> > never use UUIDS for keys though.
> >
> > merlin
> >
> >
> And, pray tell, for what exactly would you use universally unique
> identifiers.

I don't disagree that UUID are an ok choice in that scenario.  I'll
tell you what though, that scenario comes up fairly rarely.  However,
there are a couple of alternatives if you're curious.

*) Generate ids from a generator service.  This pattern is fairly
common. It has some downsides (slower, more complicated inserts
mainly) but works well in other ways.  You can mitigate the
performance downsides by allocated identifiers in blocks.

*) Use sequences, but with  a sequence id  added as a composite or
maksed into the integer. This works pretty well in practice.

merlin




Re: Is replacing transactions with CTE a good idea?

2021-04-05 Thread Bruce Momjian
On Mon, Apr  5, 2021 at 02:32:36PM -0400, Dave Cramer wrote:
> On Mon, 5 Apr 2021 at 14:18, Bruce Momjian  wrote:
> I think we are in agreement. My point was that WITH queries don't change the
> isolation semantics. 

My point is that when you combine individual queries in a single WITH
query, those queries run together with snaphot behavior as if they were
in a repeatable-read multi-statement transaction.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Is replacing transactions with CTE a good idea?

2021-04-05 Thread Dave Cramer
On Mon, 5 Apr 2021 at 14:18, Bruce Momjian  wrote:

> On Sun, Apr  4, 2021 at 10:02:20AM -0400, Dave Cramer wrote:
> > On Sun, 4 Apr 2021 at 09:12, Bruce Momjian  wrote:
> > > OK, that makes sense, but I think it is wrong minded to think that
> this
> > > absolves one of taking isolation into account.
> > >
> > > When you make the first read you will still have to deal with all
> of the
> > > isolation issues
> >
> > I have no idea what you are saying above.  Why is a SELECT-only CTE
> not
> > the same as a repeatable-read SELECT-only multi-statement
> transaction?
> > Are you saying that a SELECT in a CTE doesn't do SELECT FOR UPDATE?
> >
> >
> > No, but where is this documented ?
>
> Well, every query runs with a single snapshot, even WITH queries.  We do
> document how non-SELECT WITH visibility is handled:
>
> https://www.postgresql.org/docs/13/sql-select.html
>
> The primary query and the WITH queries are all (notionally)
> executed at
> the same time. This implies that the effects of a data-modifying
> statement in WITH cannot be seen from other parts of the query,
> other
> than by reading its RETURNING output. If two such data-modifying
> statements attempt to modify the same row, the results are
> unspecified.
>
> A key property of WITH queries is that they are normally evaluated
> only
> once per execution of the primary query, even if the primary query
> refers to them more than once. In particular, data-modifying
> statements
> are guaranteed to be executed once and only once, regardless of
> whether
> the primary query reads all or any of their output.
>
>
I think we are in agreement. My point was that WITH queries don't change
the isolation semantics.

I was pretty sure we didn't do a SELECT FOR UPDATE which would imply a lock.


Dave Cramer
www.postgres.rocks


Re: Is replacing transactions with CTE a good idea?

2021-04-05 Thread Bruce Momjian
On Sun, Apr  4, 2021 at 10:02:20AM -0400, Dave Cramer wrote:
> On Sun, 4 Apr 2021 at 09:12, Bruce Momjian  wrote:
> > OK, that makes sense, but I think it is wrong minded to think that this
> > absolves one of taking isolation into account.
> >
> > When you make the first read you will still have to deal with all of the
> > isolation issues 
> 
> I have no idea what you are saying above.  Why is a SELECT-only CTE not
> the same as a repeatable-read SELECT-only multi-statement transaction?
> Are you saying that a SELECT in a CTE doesn't do SELECT FOR UPDATE? 
> 
> 
> No, but where is this documented ?

Well, every query runs with a single snapshot, even WITH queries.  We do
document how non-SELECT WITH visibility is handled:

https://www.postgresql.org/docs/13/sql-select.html

The primary query and the WITH queries are all (notionally) executed at
the same time. This implies that the effects of a data-modifying
statement in WITH cannot be seen from other parts of the query, other
than by reading its RETURNING output. If two such data-modifying
statements attempt to modify the same row, the results are unspecified.

A key property of WITH queries is that they are normally evaluated only
once per execution of the primary query, even if the primary query
refers to them more than once. In particular, data-modifying statements
are guaranteed to be executed once and only once, regardless of whether
the primary query reads all or any of their output.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: How to deny access to Postgres when connected from host/non-local

2021-04-05 Thread A. Reichstadt
Thanks, works.

Sent from my iPhone

> On Apr 3, 2021, at 11:02, Joe Conway  wrote:
> 
> On 4/2/21 7:06 PM, A. Reichstadt wrote:
>> Hello,
>> I try to deny access to all databases on my server if the user “postgres" 
>> tries to connect from a non-local host. Here is what I did in pg_hba.conf:
>> # TYPE  DATABASEUSERADDRESS METHOD
>> # "local" is for Unix domain socket connections only
>> local   all all md5
>> # IPv4 local connections:
>> hostall all 127.0.0.1/32md5
>> # IPv6 local connections:
>> hostall all ::1/128 md5
>> # Allow replication connections from localhost, by a user with the
>> # replication privilege.
>> local   replication all md5
>> hostreplication all 127.0.0.1/32md5
>> hostreplication all ::1/128 md5
>> hostall all 0.0.0.0/0   md5
>> local   all postgrestrust
>> hostall postgres0.0.0.0/0   reject
>> But it continues to allow for Postgres to connect from anywhere through 
>> PGAdmin but also as a direct connection to port 5432. I also relaunched the 
>> server. This is version 12.
>> What else do I have to do?
>> Thanks for any help.
> 
> See:
> https://www.postgresql.org/docs/13/auth-pg-hba-conf.html
> 
> In particular:
> 
>  "Each record specifies a connection type, a client IP
>   address range (if relevant for the connection type),
>   a database name, a user name, and the authentication
>   method to be used for connections matching these
>   parameters. The first record with a matching
>   connection type, client address, requested database,
>   and user name is used to perform authentication."
> 
> So your reject line is never being reached.
> 
> HTH,
> 
> Joe
> 
> -- 
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises
> Consulting, Training, & Open Source Development




Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1

2021-04-05 Thread Stephan Knauss

Hello Tom,

On 31.03.2021 20:24, Tom Lane wrote:

Based on nearby threads, it occurs to me to ask whether you have JIT
enabled, and if so whether turning it off helps.  There seems to be
a known leak of the code fragments generated by that in some cases.


That's it!

I am quite surprised that a functionality, which is on by default does 
generate such a massive leak and goes sort of undetected.


A single backend was leaking 250MB/hour, with my multiple connections it 
was 2GB. But exactly that happened.


Doing a set jit=off immediately stopped the leak.


You mentioned that this seems to be known. Do you have pointers to the 
relevant bug-tracker/thread? I would like to follow up on this.


I have not measured the impact of jit, but in theory it could bring 
larger performance benefits. So having it enabled sounds like a good 
idea, once it stops leaking.



I tried running Valgrind on postgres but I had not much success with it. 
processes seemed to terminate quite frequently. My last use of Valgrind 
is a while ago and my use-case back then was probably much simpler.


Is it known which queries are leading to a leak? I still have the 
recording of mine, including explain. Would it help to narrow it further 
down to single queries which leak? Or is the JIT re-creating optimized 
code for each slightly modified one without freeing the old ones? So 
re-running the same query would not leak?


https://downloads.osm-tools.org/postgresql-2021-04-03_183913.csv.gz


Stephan