from:"Peter J. Holzer"

Re: how to check if the license is expired.

2024-03-31 Thread Peter J. Holzer

On 2024-03-30 09:23:15 -0400, Ron Johnson wrote:
> On Sat, Mar 30, 2024 at 9:15 AM 黄宁  wrote:
> 
> I want to develop a postgresql paid extension, then there is a local
> license file, how do I check if the license file is expired, check it once
> at each api execution, will that affect the performance of the api, is
> there any other way?
> 
> 
> What you're really asking is "how do I read a file from an extension?".
>  

We often chide users for falling into the "XY problem"[1] trap, so think
it's nice that 黄宁 asks about the bigger picture.

I can't help with the extension (never wrote one), but a few thoughts:

Basically I see three ways to get at the license information:

* from a file (as mentioned)
* from a database table
* over the network (e.g. from a license server)

On my (not very fast) laptop I can open and read a small text file in
about 25 µs. Selecting one row from a small database table takes about 100
µs, which is quite a bit slower but I tested that from an external
process. A stored procedure would be faster than that and possibly even
faster than the file access. A query over the network is unlikely to be
faster.

Plus of course you need to check the contents, which likely involves
some cryptographic operation. Checking a 2048 bit RSA signature takes
about 30 µs on my laptop, most other algorithms are slower (unless you
go with a simple HMAC which wouldn't be secure).

So realistically we are somewhere in the 50 to 200 µs range.

Is this an acceptable performance penalty per API call? If not, is it
really necessary to check this on every call? Maybe it can be done just
once per session or once per hour.

hp

[1] You have problem X and think that Y is part of the solution. So you
ask how to achieve Y. However, Z would be better than Y for solving
X, but nobody can tell you because they don't know about X.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Is this a buggy behavior?

2024-03-24 Thread Peter J. Holzer

On 2024-03-24 11:23:22 -0700, David G. Johnston wrote:
> On Sun, Mar 24, 2024 at 11:14 AM Peter J. Holzer  wrote:
> It doesn't. Your statement
> 
> > CREATE TABLE test1
> > (
> > c1 numeric   NULL ,
> > c2 varchar(36)  NOT NULL ,
> > CONSTRAINT test1_PK PRIMARY KEY (c1,c2)
> > ) ;
> 
> creates the table with both columns being defined as NOT NULL:
> 
> 
> 
> The request is a warning when defining a multi-column table constraint primary
> key if any of the columns comprising said PK are not defined already to be NOT
> NULL.
> 
> Personally, green field at least, I find that to be reasonable.

Frankly, I don't. I see no reason why I should have declare a column
in a PK explicitely as NOT NULL. 

Something like

CREATE TABLE test1
(
c1 numeric,
c2 varchar(36),
CONSTRAINT test1_PK PRIMARY KEY (c1,c2)
) ;

or

create table test2 (
id serial primary key,
...
);

should be totally fine. An explicit NOT NULL here is just noise and
doesn't add value.

I have some sympathy for the idea that an explicit NULL in a column
definition should cause a warning if the resulting column would not in
fact be nullable. But since writing NULL is otherwise exactly equivalent
to writing nothing, even that seems a bit inconsistent and might be
more confusing than helpful. In any case it seems like a very low-value
change to me which should only be done if it's very little effort
(which apparently it isn't).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Is this a buggy behavior?

2024-03-24 Thread Peter J. Holzer

On 2024-03-24 21:05:04 +0530, sud wrote:
> Do you specifically mean that 'null'  keyword is just not making any sense 
> here
> in postgres. But even if that is the case , i tried inserting nothing (hoping
> "nothing" is "null" in true sense),

This is a strange hope.

> but then too it failed in the first statement while inserting which is
> fine as per the PK. 
> 
> But don't you think,in the first place it shouldn't have been allowed to 
> create
> the table with one of the composite PK columns being defined as NULL.

It doesn't. Your statement

> CREATE TABLE test1
> (
> c1 numeric   NULL ,
> c2 varchar(36)  NOT NULL ,
> CONSTRAINT test1_PK PRIMARY KEY (c1,c2)
> ) ;

creates the table with both columns being defined as NOT NULL:

hjp=> CREATE TABLE test1
(
c1 numeric   NULL ,
c2 varchar(36)  NOT NULL ,
CONSTRAINT test1_PK PRIMARY KEY (c1,c2)
) ;
CREATE TABLE
Time: 16.815 ms
hjp=> \d test1
 Table "hjp.test1"
╔╤═══╤═══╤══╤═╗
║ Column │ Type  │ Collation │ Nullable │ Default ║
╟┼───┼───┼──┼─╢
║ c1 │ numeric   │   │ not null │ ║
║ c2 │ character varying(36) │   │ not null │ ║
╚╧═══╧═══╧══╧═╝
Indexes:
"test1_pk" PRIMARY KEY, btree (c1, c2)


> And then , while inserting the null record, it should say that the PK
> constraint is violated but not the "not null constraint" violated.

That may just be an artifact of the implementation. You can check
whether a value to be inserted is null or not without searching the
table, so that is done first. Only then you have to check the index for
a possible duplicate value, so that's done later.

But as a user I actually prefer it that way. The more precisely the
database can tell me why the insert failed, the better.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: PostgreSQL as advanced job queuing system

2024-03-22 Thread Peter J. Holzer

ops=1) ║
║   ->  Index Only Scan using jobs_queue_id_id_idx on jobs  
(cost=0.43..25.68 rows=1000 width=8) (actual time=0.011..0.119 rows=964 
loops=28) ║
║ Index Cond: (queue_id = queues.id)
  ║
║ Heap Fetches: 0   
  ║
║ Planning Time: 0.362 ms   
  ║
║ Execution Time: 9.158 ms  
  ║
╚═╝
(13 rows)

hjp=> \d queues
 Table "public.queues"
╔══╤═╤═══╤══╤═╗
║  Column  │  Type   │ Collation │ Nullable │ Default ║
╟──┼─┼───┼──┼─╢
║ id   │ integer │   │ not null │ ║
║ priority │ integer │   │ not null │ 0   ║
╚══╧═╧═══╧══╧═╝
Indexes:
"queues_pkey" PRIMARY KEY, btree (id)
"queues_priority_idx" btree (priority)
Referenced by:
TABLE "jobs" CONSTRAINT "jobs_queue_id_fkey" FOREIGN KEY (queue_id) 
REFERENCES queues(id)

hjp=> \d jobs
  Table "public.jobs"
╔══╤═╤═══╤══╤══╗
║  Column  │  Type   │ Collation │ Nullable │ Default  ║
╟──┼─┼───┼──┼──╢
║ id   │ integer │   │ not null │ nextval('jobs_id_seq'::regclass) ║
║ queue_id │ integer │   │ not null │  ║
╚══╧═╧═══╧══╧══╝
Indexes:
"jobs_pkey" PRIMARY KEY, btree (id)
    "jobs_queue_id_id_idx" btree (queue_id, id)
"jobs_queue_id_idx" btree (queue_id)
Foreign-key constraints:
"jobs_queue_id_fkey" FOREIGN KEY (queue_id) REFERENCES queues(id)


If you do have very few very long queues it might be faster to query
each queue separately.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Postgresql docker health check

2024-03-14 Thread Peter J. Holzer

On 2024-03-12 15:43:13 +0100, Amna Abdul Rehman wrote:
> When my application runs overnight, I receive the error message
> 'FATAL: Sorry, too many clients already.' To address this, I increased
> the number of max_connections to 3000, 5000, and also 1, but I
> still receive the error. Now, I'm considering implementing a health
> check. Specifically, I'd like to automatically restart the PostgreSQL
> container if this error occurs.

While implementing "emergency brakes" like this is generally a good
idea, you should investigate why you have so many connections. If you
have way more connections than you can reasonably expect, something is
wrong, And it is better to fix the root cause than to just hit
everything over the head with a hammer periodically.

        hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: creating a subset DB efficiently ?

2024-03-09 Thread Peter J. Holzer

On 2024-03-08 11:22:17 -0500, David Gauthier wrote:
> Related question...
> The "delete from par_tbl_a where project <> 'a' " is taking forever.

Have you checked whether all the FK columns have an index? Otherwise
checking for references means a full table scan which may take a long
time.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Thoughts on user-defined types for talk at Postgres conference?

2024-03-07 Thread Peter J. Holzer

On 2024-03-06 12:45:33 -0500, Stephen Frost wrote:
> =*# select pg_column_size(t2.*) from t2;
>  pg_column_size 
> 
>  53
> (1 row)
> 
> That's an additional 21 bytes, which is really quite a lot.  What's
> included in those 21 bytes are
[...]
> the type information (typmod if there is one and the OID of the
> composite type),

Is it necessary to store this in every row? Can a column contain
different composite types?

    hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Orphan files filling root partition after crash

2024-03-03 Thread Peter J. Holzer

On 2024-02-28 20:30:35 +0100, Dimitrios Apostolou wrote:
> On Wed, 28 Feb 2024, Laurenz Albe wrote:
> > On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:
> > > So what is the moral of the story? How to guard against this?
> > 
> > Monitor disk usage ...
> 
> It happened *fast*. And it was quite a big suprise coming
> from "just" a disk-full situation.

Been there.

To prevent this in the future I wrote a small script to monitor disk
space (on multiple hosts and multiple file systems) every few seconds
and invoke another script (which just terminates all database
connections - a bit drastic but effective) if free space runs low:
https://github.com/hjp/platzangst

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Postgres 16 missing from apt repo?

2024-02-24 Thread Peter J. Holzer

On 2024-02-22 15:18:37 -0800, Tamal Saha wrote:
> Thanks for the response. I am trying to build the postgres:16.1 docker image
> and it fails. This is the docker file (I am not associated with Docker Inc.).
> https://github.com/docker-library/postgres/blob/
> d416768b1a7f03919b9cf0fef6adc9dcad937888/16/bookworm/Dockerfile#L93

I don't know about docker images. My answer is for Debian or Ubuntu
systems in general.


> I get the following error
> 
> 6.976 Package postgresql-16 is not available, but is referred to by another
> package.
> 6.976 This may mean that the package is missing, has been obsoleted, or
> 6.976 is only available from another source
> 6.976
> 6.980 E: Version '16.1-1.pgdg120+1' for 'postgresql-16' was not found
> 
> How may I fix this?

The current version is 16.2, not 16.1. You appear to have cached an
obsolete index.

Use "apt update" to update the index.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-17 Thread Peter J. Holzer

On 2024-02-16 01:34:01 +0100, Peter J. Holzer wrote:
> On 2024-02-15 16:51:56 -0700, David G. Johnston wrote:
> > What you see with an exact type is what you get, which allows for 
> > implementing
> > equality, unlike inexact which requires epsilon checking.
> 
> You can check binary fp values for equality. If they are equal, they
> will compare equal. If they aren't, they won't.
> 
> What you can't expect is that the laws of commutativity, associativity,
> etc. hold. If you compute a value in two different ways which should be
> equivalent mathematically (e.g. a*b/c vs. a/c*b), the result may be
> different and an equality test may fail. But that is also the case for
> numeric (and of course integer).

To illustrate that point:

hjp=> create table t_n (a numeric, b numeric, c numeric);
CREATE TABLE

hjp=> insert into t_n values(47, 52, 10);
INSERT 0 1

-- the numbers are not specially chosen. I just invoked
-- select (random()*100)::int;
-- three times, and they were the ones that came up.

hjp=> select *, a / b * c, a * c / b, a / b * c = a * c / b from t_n;
╔╤╤╤╤╤══╗
║ a  │ b  │ c  │?column?│  ?column?  │ ?column? ║
╟┼┼┼┼┼──╢
║ 47 │ 52 │ 10 │ 9.03846153846153846150 │ 9.0384615384615385 │ f║
╚╧╧╧╧╧══╝
(1 row)

So with type numeric two expressions which should be equal
mathematically, aren't in fact equal.

Now let's try the same thing with binary floating point:

hjp=> create table t_f (a float8, b float8, c float8);
CREATE TABLE

hjp=> insert into t_f values(47, 52, 10);
INSERT 0 1

hjp=> select *, a / b * c, a * c / b, a / b * c = a * c / b from t_f;
╔╤╤╤═══╤═══╤══╗
║ a  │ b  │ c  │ ?column?  │ ?column?  │ ?column? ║
╟┼┼┼───┼───┼──╢
║ 47 │ 52 │ 10 │ 9.038461538461538 │ 9.038461538461538 │ t║
╚╧╧╧═══╧═══╧══╝
(1 row)

Now they are indeed equal. This is *not* guaranteed and I got a bit
lucky here, but the fact that I got lucky on the first try shows that
"float bad, numeric good" is not backed up by reality.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-16 Thread Peter J. Holzer

On 2024-02-16 12:10:20 +0530, veem v wrote:
> 
> On Fri, 16 Feb 2024 at 06:04, Peter J. Holzer  wrote:
> 
> On 2024-02-15 16:51:56 -0700, David G. Johnston wrote:
> > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer  
> wrote:
> >     On 2024-02-14 22:55:01 -0700, David G. Johnston wrote:
> >     > On Tuesday, February 13, 2024, veem v  wrote:
> >     >
> >     >     float data types rather than numeric. This will give better
> >     >     performance.
> >     >
> >     >
> >     > Only use an inexact floating-point data type if you truly
> >     > understand what you are getting yourself into.  Quickly getting
> >     > the wrong answer isn’t tolerable solution.
> >
> >     Do NOT assume that a decimal type (even if it can grow to 
> ridiculuous
> >     lengths like PostgreSQL's numeric) is exact in the mathematical
> sense.
> >     It isn't. It cannot represent almost all real numbers
> >
> >
> 
> 
> Thank You.
> 
> So it looks like the use cases where we don't need precision or decimal point
> values to be stored in postgres , integer data type is the way to go without a
> doubt.
> 
> However in cases of precision is required, as you all mentioned there are
> certain issues(rounding error etc) with "Float" data type and considering a
> normal developers usage point of view, it should be the Numeric type which we
> should use.

You misunderstood. My point was that these rounding errors also happen
with numeric, and if you want to avoid or minimize them you have to
understand what you are doing. For a hilarious example of what happens
if you don't understand that, see
https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price

I basically see two reasons to use numeric:

* Your numbers are amounts of money. Accountants are neither mathematicians
  nor engineers, and numeric mimics the way they think. So the results
  will be wrong in the correct way ;-)
* You need lots (more than 15 or 18) digits.

For anything else there is a good chance that float8 or int8 is a better
choice, because those types behave much more consistently.

Of course there are valid reasons to use other types (including numeric)
but the point is that each type has real pros and cons and false
arguments like "numeric is an exact type and float isn't" is not
helpful. That said, "I don't understand binary numbers" might be a valid
reason.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-15 Thread Peter J. Holzer

On 2024-02-15 16:51:56 -0700, David G. Johnston wrote:
> On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer  wrote:
> On 2024-02-14 22:55:01 -0700, David G. Johnston wrote:
> > On Tuesday, February 13, 2024, veem v  wrote:
> >
> >     float data types rather than numeric. This will give better
> > performance.
> >
> >
> > Only use an inexact floating-point data type if you truly
> > understand what you are getting yourself into.  Quickly getting
> > the wrong answer isn’t tolerable solution.
> 
> Do NOT assume that a decimal type (even if it can grow to ridiculuous
> lengths like PostgreSQL's numeric) is exact in the mathematical sense.
> It isn't. It cannot represent almost all real numbers
> 
> 
> That is an unusual definition for exact, I wouldn't have considered the
> requirement to represent all real numbers to be included in it.
> 
> What you see with an exact type is what you get, which allows for implementing
> equality, unlike inexact which requires epsilon checking.

You can check binary fp values for equality. If they are equal, they
will compare equal. If they aren't, they won't.

What you can't expect is that the laws of commutativity, associativity,
etc. hold. If you compute a value in two different ways which should be
equivalent mathematically (e.g. a*b/c vs. a/c*b), the result may be
different and an equality test may fail. But that is also the case for
numeric (and of course integer). You might get around that by epsilon
checking, but whether that's the right thing to do depends on your
application.

And most importantly, and gets most people on the "oh noes, binary fp is
inexact" bandwagon is that decimal fractions (1/10, 1/100, ...) are not
exactly representable in binary, just like 1/3, 1/7, 1/11, ... aren't
exactly represntable in decimal. People are used the latter, but not
the former. But mathematically, that doesn't really make a difference.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Using a Conversion Table

2024-02-15 Thread Peter J. Holzer

On 2024-02-14 10:02:37 -0500, Greg Sabino Mullane wrote:
>     "Fiscal year" double precision,
> 
> This column is an INTEGER in your other table, so your schema is not even
> internally consistent! Try to use TEXT, INT, DATE and TIMESTAMPTZ whenever
> possible,

While using double precision for fiscal year is rather grotesque
overkill (smallint would be sufficient) it isn't wrong: Any value you
could conceivably want to store for a fiscal year fits nicely (with lots
of room to spare) into a double precision.

I agree that consistency would be nice, though.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: PostgreSQL DB in prod, test, debug

2024-02-15 Thread Peter J. Holzer

On 2024-02-14 11:25:48 +0100, Daniel Gustafsson wrote:
> > On 14 Feb 2024, at 10:59, Simon Connah  
> > wrote:
> 
> > This is probably a stupid question so I apologies in advance.
> 
> There is no such thing.
> 
> > What I think is the best way to do this is to do a pg_dump of the
> > database (using the --schema-only flag) and then load it into a test
> > only database that gets created at the start of the unit tests and
> > destroyed at the end. The automated tests will insert, update,
> > delete and select data to test if it all still works.
> 
> If the source of truth for your schema is the database, then sure.  If the
> source of truth is a .sql file in your source code repository then you should
> use that.

I sort of do both for one of my projects:

I originally created the SQL script by running pg_dump on a manually
constructed test database (I do have code to create the empty schema,
but I had to insert the test data manually). That script went into the
git repo.

The test files all contain a fixture which drops and recreates the test
database using that sql file.

When I add a new test case I try to make do with the existing test data.

When I need additional data for a new test case, I create a new pristine
test database using the sql file, add the new data, and then create a
new sql file using sql_dump which is then committed with the test cases.

Same for migrations: If I need to migrate the schema, I run the
migration on the test database, then dump and commit it.

This project is small enough (86 tests in 10 files) that all test cases
can use the same test data. However, I could easily use different test
data for different tests.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-15 Thread Peter J. Holzer

On 2024-02-14 22:55:01 -0700, David G. Johnston wrote:
> On Tuesday, February 13, 2024, veem v  wrote:
> 
> float data types rather than numeric. This will give better performance.
> 
> 
>  Only use an inexact floating-point data type if you truly understand what you
> are getting yourself into.  Quickly getting the wrong answer isn’t tolerable
> solution.

Do NOT assume that a decimal type (even if it can grow to ridiculuous
lengths like PostgreSQL's numeric) is exact in the mathematical sense.
It isn't. It cannot represent almost all real numbers. No pi or e of
course, but also no seemingly simple fractions like 1/3 or 1/7.

Unless you never divide anything, you will need to be aware of the
rounding behaviour, just as you have to with binary floating point
types. And if you use a finite precision you will also have to deal with
rounding on multiplication (and possibly even addition and subtraction,
if you use different precisions).

Almost 40 years ago, our numerical methods professor started his first
lecture with the sentence "You can use a computer for anything - except
computing". He spent the rest of the semester proving himself wrong,
of course, but computing correctly is hard - and choosing a data type
which more closely mimics the way we learn to compute in primary school
doesn't necessarily make it easier. Mostly it just makes it harder to
spot the errors ;-).

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How should we design our tables and indexes

2024-02-13 Thread Peter J. Holzer

On 2024-02-12 11:46:35 -0500, Greg Sabino Mullane wrote:
> If PR_ID is a must in the Join criteria between these table tables table1,
> table2 in all the queries, then is  it advisable to have a composite index
> like (pr_id, mid), (pr_id,cid) etc rather than having index on individual
> columns?
> 
> 
> No - individual indexes are better, and Postgres has no problem combining them
> when needed.

I'm a bit unsure if I should mention this as veem probably benefits more
from hard and simple rules than more nuanced answers, but that really
depends on the type of query.

For some kinds of queries a composite index can be dramatically faster.
While Postgres can combine indexes that means scanning both indexes and
combining the result, which may need a lot more disk I/O than scanning a
composite index. Indeed, in the cases where a composite index would be
useful but doesn't exist, PostgreSQL usually just chooses the best of
the single column indexes and ignores the rest.

That said, my rule of thumb is to create just single column indexes at
first and only create composite indexes if they are necessary.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-13 Thread Peter J. Holzer

On 2024-02-13 01:53:25 +0530, veem v wrote:
> On Mon, 12 Feb 2024 at 03:40, Peter J. Holzer  wrote:
> 
> The fixed width types are those that the CPU can directly process:
> Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64
> bits. The CPU can read and write them with a single memory access, it
> can do arithmetic with a single instruction, etc.
> 
> Number/Numeric are not native types on any CPU. To read them the CPU
> needs several memory accesses (probably one per byte unless you get
> really clever) and then it can't do any calculations with them
> directly, instead it has run a subroutine which does operations on
> little chunks and then puts those chunks together again - basically the
> same as you do when you're doing long addition or multiplication on
> paper. So that's not very efficient.
> 
> 
> So it looks like the fixed length data type(like integer, float) should be the
> first choice while choosing the data type of the attributes wherever possible,
> as these are native types. (Like choosing "Integer/float" over "Numeric",
> "Char" over "Varchar" etc). 

Please do not conflate "char(n)" with native machine types like int or
float. These are very different things. A char(n) is string of fixed but
arbitrary length. This is not something a CPU can process in a single
instruction. It has to go over it character by character.

There is almost never a reason to use char(n). Just use varchar(n) or in
the case of PostgreSQL just varchar or text.

> However I do see even in Oracle databases, we have Integer type too,

Not really. INTEGER is just an alias for NUMBER(38) in Oracle (see for
example
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/Data-Types.html).
It's not the same as an INTEGER in PostgreSQL.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-13 Thread Peter J. Holzer

On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote:
> On Mon, Feb 12, 2024 at 1:50 AM veem v  wrote:
> 
> So we were thinking, adding many column to a table should be fine in
> postgres (as here we have a use case in which total number of columns may
> go till ~500+). But then,  considering the access of columns towards the
> end of a row is going to add more time as compared to the column which is
> at the start of the row. As you mentioned, accessing 100th column may add 
> 4
> to 5 times more as compared to the access of first column. So , is it
> advisable here to go for similar approach of breaking the table into two ,
> if the total number of column reaches certain number/threshold for a 
> table?
> 
> 
> I'm not sure of what Peter was testing exactly to get those 4-5x figures,

Sorry, I should have included my benchmark code (it's short enough - see
below).

What i was actually timing was 

select count(*) from t_postgresql_column_bench where v{i} = 'a'

for various i.

> but I presume that is column access time,

That was the goal. Of course there is always some overhead but I figured
that by counting rows where a column has a constant value the overhead
is minimal or at least constant.

> which would not mean a direct effect on your total query time of 4-5x.

Right. In any real application the column access time is only a part of
the total processing time and probably a small part, so the effect on
total processing time is correspondingly smaller.

hp


#!/usr/bin/python3

import random
import time
import psycopg2

n_cols = 100
n_rows = 10

db = psycopg2.connect("")
csr = db.cursor()

csr.execute("drop table if exists t_postgresql_column_bench")
q = "create table t_postgresql_column_bench ("
q += ", ".join(f"v{i} text" for i in range(n_cols))
q += ")"
csr.execute(q)
q = "insert into t_postgresql_column_bench values("
q += ", ".join("%s" for i in range(n_cols))
q += ")"

for j in range(n_rows):
v = [ chr(random.randint(96+1, 96+26)) for i in range(n_cols)]
csr.execute(q, v)
db.commit()

for i in range(n_cols):
q = f"select count(*) from t_postgresql_column_bench where v{i} = 'a'"
t0 = time.clock_gettime(time.CLOCK_MONOTONIC)
csr.execute(q)
r = csr.fetchall()
print(r)
t1 = time.clock_gettime(time.CLOCK_MONOTONIC)
print(i, t1 - t0)
db.commit()


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer

On 2024-02-11 12:08:47 -0500, Ron Johnson wrote:
> On Sun, Feb 11, 2024 at 11:54 AM veem v  wrote:
> When you said "you would normally prefer those over numeric " I was
> thinking the opposite. As you mentioned integer is a fixed length data 
> type
> and will occupy 4 bytes whether you store 15 or .But in case of
> variable length type like Number or numeric , it will resize itself based
> on the actual data, So is there any downside of going with the variable
> length data type like Numeric,
> 
> 
> Consider a table with a bunch of NUMERIC fields.  One of those records has
> small values (aka three bytes).  It fits neatly in 2KiB.
> 
> And then you update all those NUMERIC fields to big numbers that take 15
> bytes.  Suddenly (or eventually, if you update them at different times), the
> record does not fit in 2KiB, and so must be moved to its own.page.  That 
> causes
> extra IO.

I think that's not much of a concern with PostgreSQL because you can't
update a row in-place anyway because of MVCC. So in any case you're
writing a new row. If you're lucky there is enough free space in the same
page and you can do a HOT update, but that's quite independent on
whether the row changes size.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer

On 2024-02-11 22:23:58 +0530, veem v wrote:
> On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer  wrote:
> 
> > Similarly for Number/Numeric data type.
> 
> Number in Oracle and numeric in PostgreSQL are variable length types.
> But in PostgreSQL you also have a lot of fixed length numeric types
> (from boolean to bigint as well as float4 and float8) and you would
> normally prefer those over numeric (unless you really need a decimal or
> very long type). So padding is something you would encounter in a
> typical PostgreSQL database while it just wouldn't happen in a typical
> Oracle database.
> 
> 
> 
> When you said "you would normally prefer those over numeric " I was thinking
> the opposite. As you mentioned integer is a fixed length data type and will
> occupy 4 bytes whether you store 15 or .But in case of variable
> length type like Number or numeric , it will resize itself based on the actual
> data, So is there any downside of going with the variable length data type 
> like
> Numeric, Varchar type always for defining the data elements?

The fixed width types are those that the CPU can directly process:
Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64
bits. The CPU can read and write them with a single memory access, it
can do arithmetic with a single instruction, etc.

Number/Numeric are not native types on any CPU. To read them the CPU
needs several memory accesses (probably one per byte unless you get
really clever) and then it can't do any calculations with them
directly, instead it has run a subroutine which does operations on
little chunks and then puts those chunks together again - basically the
same as you do when you're doing long addition or multiplication on
paper. So that's not very efficient.

Also the space savings aren't that great and probably even negative: In
my example the length of a numeric type with at most 10 digits varied
between 3 and 7 bytes, Only for values between -99 and +99 is this (3
bytes) actually shorter, for other values it's the same length or
longer. So you would only save space if most of your values are in that
±99 range. But not if all of them are, because then you could simply use
a smallint (Range -32768..32767) in PostgreSQL and save another byte.

Finally - and I'm probably biased in this as I learned programming 40
years ago - to me the native types feel more natural than product
specific variable-length decimal types.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-11 Thread Peter J. Holzer

On 2024-02-11 13:25:10 +0530, veem v wrote:
> On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer  wrote:
> Yes. Numbers in Oracle are variable length, so most Oracle tables
> wouldn't contain many fixed length columns. In PostgreSQL must numeric
> types are fixed length, so you'll have quite a lot of them.
> 
> 
> 
> So it means , say in other databases like (oracle database), we were careless
> choosing the data length , say for example Varchar2(4000), if the real data
> which is inserted into the table holds a varchar string of length 20 bytes 
> then
> Oracle trimmed it to occupy the 20 bytes length only in the storage. but in
> postgre here we need to be cautious and define the length as what the data
> attribute can max contains , because that amount of fixed space is allocated
> to every value which is inserted into the table for that attribute/data
> element.

No. Varchar is (as the name implies) a variable length data type and
both Oracle and PostgreSQL store only the actual value plus some length
indicator in the database. Indeed, in PostgreSQL you don't need to
specify the maximum length at all.

However, if you specify a column as "integer" in PostgreSQL it will
always occupy 4 bytes, whether you store the value 15 in it or
9. In Oracle, there is no "integer" type and the roughly
equivalent number(10,0) is actually a variable length floating point
number. So 15 will occupy only 3 bytes and 9 will occupy 7
bytes[1].

> Similarly for Number/Numeric data type.

Number in Oracle and numeric in PostgreSQL are variable length types.
But in PostgreSQL you also have a lot of fixed length numeric types
(from boolean to bigint as well as float4 and float8) and you would
normally prefer those over numeric (unless you really need a decimal or
very long type). So padding is something you would encounter in a
typical PostgreSQL database while it just wouldn't happen in a typical
Oracle database.

But as Laurenz wrote that's a micro optimization which you usually
shouldn't spend much time on. OTOH, if you have hundreds of columns in a
table, maybe it is worthwhile to spend some time on analyzing access
patterns and optimizing the data layout.

hp

[1] From memory. I may be misremembering the details.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-10 Thread Peter J. Holzer

On 2024-02-06 11:25:05 +0530, veem v wrote:
> With respect to the storage part:- In Oracle database we were supposed to keep
> the frequently accessed column first and less frequently accessed columns
> towards last in the table while creating table DDL. This used to help the 
> query
> performance while accessing those columns as it's a row store database. Are
> there any such performance implications exists in postgres? And there the data
> types like varchar used to only occupy the space which the real data/value
> contains. 
> 
> But here in postgres, it seems a bit different as it has significant
> overhead in storage, as these seem like being fixed length data types and will
> occupy the space no matter what the read data holds.

Yes. Numbers in Oracle are variable length, so most Oracle tables
wouldn't contain many fixed length columns. In PostgreSQL must numeric
types are fixed length, so you'll have quite a lot of them.

> So curious to know, if in this case will there be any performance
> impact accessing those columns, if such a column which is frequently
> accessed but has been put towards the end of the table because of
> storage space saving?

Yes. PostgreSQL has to skip over each column until it gets to the column
it wants to read. So reading a column at the end of the row will be
slower than reading one at the start. A very simplistic benchmark (100
columns of type text each containing a single character) suggests that
accessing column 100 takes about 4 or 5 times as long as column 1, and
the access times for the coiumns between are pretty linear.

So there's a bit of a tradeoff between minimizing alignment overhead and
arranging columns for fastest access.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-04 Thread Peter J. Holzer

On 2024-02-04 02:14:20 +0530, Lok P wrote:
> However , as we have ~5billion rows in the base table and out of that , we 
> were
> expecting almost half i.e. ~2billion would be duplicates.

That's way more than I expected from your original description. And it
of course raises the question whether it's a good idea to just throw
away all that data or if you need to keep that in a normalized way.

> And you said, doing the inserts using the "order by CTID Offset"
> approach must cause one full sequential scan of the whole table for
> loading each chunk/10M of rows and that would take a long time I
> believe.
> 
> I am still trying to understand the other approach which you suggested. Not
> able to understand "you can select where index_col > last order by index_col 
> limit 10M," .
> However, to get the max ID value of the last 10M loaded rows in target, do you
> say that having an PK index created on that target table column(ID) will
> help,

Yes. Getting the maximum value from an index is a very fast operation.
You just have to traverse down the right edge of the tree (or you may
even be able to access the right-most leaf page directly).

>  and we can save the max (ID) value subsequently in another table to fetch
> and keep loading from the source table (as ID>Max_ID stored in temp table)?

Another table or a variable in a script (personally, if I need to do
something repeatedly, I usually write a script in the scripting language
I feel most comfortable in (which has been Python for the last 7 or 8
years, Perl before that) which gives you variables, loops, conditionals
and - above all - repeatability.

> OR
> Would it be better to do it in one shot only , but by setting a higher value 
> of
> some parameters like "maintenance_work_mem" or "max_parallel_workers"?

Hard to say. Normally, processing in fewer. bigger chunks is faster. But
RAM is much faster than disk (even with SSDs), so it might be faster to
make work_mem as large as you can and then use a chunk size which just
fits inside work_mem is faster. Of course finding that sweet spot takes
experimentation, hence time, and it may make little sense to experiment
for 20 hours just to save 40 minutes.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to do faster DML

2024-02-03 Thread Peter J. Holzer

On 2024-02-03 19:25:12 +0530, Lok P wrote:
> Apology. One correction, the query is like below. I. E filter will be on on
> ctid which I believe is equivalent of rowid in oracle and we will not need the
> index on Id column then. 
> 
>  But, it still runs long, so thinking any other way to make the duplicate
> removal faster? 
> 
> Also wondering , the index creation which took ~2.5hrs+ , would that have been
> made faster any possible way by allowing more db resource through some session
> level db parameter setting?
> 
> create table TAB1_New
> as
> SELECT  * from TAB1 A
> where CTID in
>       (select min(CTID) from TAB1
>       group by ID having count(ID)>=1 );

That »having count(ID)>=1« seems redundant to me. Surely every id which
occurs in the table occurs at least once?

Since you want ID to be unique I assume that it is already almost
unique - so only a small fraction of the ids will be duplicates. So I
would start with creating a list of duplicates:

create table tab1_dups as
select id, count(*) from tab1 group by id having count(*) > 1;

This will still take some time because it needs to build a temporary
structure large enough to hold a count for each individual id. But at
least then you'll have a much smaller table to use for further cleanup.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Changing a schema's name with function1 calling function2

2023-12-25 Thread Peter J. Holzer

On 2023-12-24 14:27:19 -0800, Adrian Klaver wrote:
> On 12/24/23 13:43, Peter J. Holzer wrote:
> > I think you misunderstood Wilma. What she is asking for is a "keyword"
> > or "magic variable" (or whatever you want to call it) which you can
> > specify in CREATE|ALTER FUNCTION ... SET SEARCH_PATH = ..., which refers
> > to the schema the function is (being) created in.
> 
> Maybe, but I don't think so. From one of Wilma's previous post:
> 
> "Therefore it would be great if there was a variable which I could set in a
> search_path (like the variable "$user") which denotes the function's schema
> and which is only evaluated when the function is executed, i.e. the variable
> would be the value of the function's search_path in the function's meta
> data. This variable, e.g. "$function_schema" would still denote the correct
> schema after renaming the schema."

I interpreted that as meaning what I wrote above.

So we'll have to wait for Wilma to clarify what she really meant.


> It seems to be pretty tied to search_path. Now there is, from:
> 
> https://www.postgresql.org/docs/current/catalog-pg-proc.html
> 
> proconfig text[]
> 
> Function's local settings for run-time configuration variables
> 
> Though that is part of the function definition not the code inside the
> function as you show below. Also you would still need to determine what was
> the appropriate schema before creating/altering the setting for the
> function.

"Determining the appropriate schema" is what - AIUI - the requested
magic variable is for.


> > So if you create the function with
> > 
> >  create function foo (...)
> >  set search_path to __function_schema__
> >  $$ ... $$
> > 
> > (using the "dunder" convention (from some other programming languages)
> > to denote the magic variable/keyword)
> > 
> > the search path would be set to whatever schema was first in the
> > search_path when the function was created.
> 
> That assumes the first schema is the correct one. Not something that could
> be counted on given search_path can be set any number of ways at any time.

Yes, I was sloppy there. What I actually meant was "the first schema
where the user actually has permission to create a function" (and no
doubt that isn't 100% correct either). I would expect that in a
deployment situation that would be the first schema in the search_path,
but you are of course correct that this is not necessarily the case.
Anyway, that was only illustrative. The point of my examples was that no
matter how the function is created, __function_schema__ always refers to
the schema the function actually is in.

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Changing a schema's name with function1 calling function2

2023-12-24 Thread Peter J. Holzer

On 2023-12-23 08:31:39 -0800, Adrian Klaver wrote:
> On 12/23/23 08:12, Wilma Wantren wrote:
> > I had already feared that such a variable does not exist (because I
> > had not found it). I think that's a pity, because I suspect that in
> > at least 90% of the cases where a function needs a search_path, this
> > variable would be the value of the search_path, so that in 90% of
> > the cases no change to the search_path would be necessary after
> > renaming the schema.
> 
> I would say the issue is with this from your previous post:
> 
> 
> "Therefore it would be great if there was a variable which I could set in a
> search_path (like the variable "$user") which denotes the function's schema
> and which is only evaluated when the function is executed, i.e. the variable
> would be the value of the function's search_path in the function's meta
> data."
> 
> search_path can be set(just off the top of head):
> 
> postgresql.conf and it's include files
> ALTER DATABASE
> ALTER FUNCTION
> ALTER ROLE
> ALTER SYSTEM
> At any point by a user/script/function/etc.
> 
> Even if such a variable existed you be chasing down the 'correct' version of
> search_path that had it. Basically where you are now.

I think you misunderstood Wilma. What she is asking for is a "keyword"
or "magic variable" (or whatever you want to call it) which you can
specify in CREATE|ALTER FUNCTION ... SET SEARCH_PATH = ..., which refers
to the schema the function is (being) created in.

So if you create the function with 

create function foo (...)
set search_path to __function_schema__
$$ ... $$

(using the "dunder" convention (from some other programming languages)
to denote the magic variable/keyword)

the search path would be set to whatever schema was first in the
search_path when the function was created.

If you create it with 

create function my_schema.foo (...)
set search_path to __function_schema__
$$ ... $$

it would be set to "my_schema".

And if you changed the schema with 

alter function foo set schema my_new_schema

it would be changed to "my_new_schema".

Personally I think that effect should be easy enough to create in your
deployment or migration scripts but of course that assumes that you have
such scripts. If you are doing your deployments manually (especially by
cloning a template as described by Wilma) I can see how that feature
would make things easier and/or reduce the risk of errors.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Building Extension on Linux fails with relocation error

2023-12-22 Thread Peter J. Holzer

On 2023-12-22 10:31:14 -0500, Tom Lane wrote:
> Binaek Sarkar  writes:
> > The issue arises with a Go module that is compiled into a .a archive file
> > using -buildmode=c-archive, leading to a *file size of approximately 393 
> > MB*.
[...]
> > Interestingly, the *build completes successfully when the .a file is
> > smaller* (around 100 MB).
> 
> Pure luck I suspect.

I seem to remember a 256MB limit for position independent code on x86.
The current man-page for GCC doesn't mention such a limit, though, so I
may be mistaken.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Help understanding server-side logging (and more...)

2023-12-21 Thread Peter J. Holzer

On 2023-12-21 12:05:41 +0100, Dominique Devienne wrote:
> FWIW, the symptoms they are experiencing is that initially all queries
> are fast, then they start to get longer and longer, to the point all
> services accessing PostgreSQL grind to a halt and basically hang.
> While that happens, one can connect to the DB from a shell (that
> cluster has a single DB) w/o issues, and run queries just fine

If you do that, do you see the "hanging" queries in pg_stat_activity? If
so, what are they waiting for?

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to generate random bigint

2023-12-21 Thread Peter J. Holzer

On 2023-12-21 00:06:39 -0600, Phillip Diffley wrote:
> Postgres's random() function generates a random double. That can be converted
> to a random int for smaller integers, but a double can't represent all of the
> values in a bigint. Is there a recommended way to generate a random bigint in
> Postgres?

Call random() twice and add the results?

Like this:

select (random() * 2147483648)::int8 * 4294967296
   + (random() * 4294967296)::int8;

(This assumes that random() actually returns at least 32 random bits.
If that's not the case you'll need more calls to random())

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Read write performance check

2023-12-19 Thread Peter J. Holzer

On 2023-12-20 00:44:48 +0530, veem v wrote:
>  So at first, we need to populate the base tables with the necessary data (say
> 100million rows) with required skewness using random functions to generate the
> variation in the values of different data types. Then in case of row by row
> write/read test , we can traverse in a cursor loop. and in case of batch 
> write/
> insert , we need to traverse in a bulk collect loop. Something like below and
> then this code can be wrapped into a procedure and passed to the pgbench and
> executed from there. Please correct me if I'm wrong.

One important point to consider for benchmarks is that your benchmark
has to be similar to the real application to be useful. If your real
application runs on a different node and connects to the database over
the network, a benchmark running within a stored procedure may not be
very indicative of real performance.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Store PDF files in PostgreDB

2023-12-08 Thread Peter J. Holzer

On 2023-12-06 09:43:27 -0500, Ron Johnson wrote:
> On Wed, Dec 6, 2023 at 9:39 AM Priyadharshini Vellaisamy 
>  > wrote:
> Please let m know can we store PDF files in PostgreDB ?
> 
> 
> You can store anything up to 1GB in Postgresql using data type bytea.
>  
> 
> If so, can we retrieve it effectively?
> 
> 
> Effectively?
> 
> (We've been storing images in PG for 11 years.)

They probably meant "efficiently".

The way large data is stored in PostgreSQL adds quite a bit of overhead
compared to a plain file in a file system. Plus you also have to deal
with the socket connection (unix or TCP) to the database.

That said, on my laptop (which is several years old and not very fast,
and I'm using Python, which isn't ideal for that job either) I can get
PDFs at somewhere between 47 MB/s and 77 MB/s. Which may or may not be
fast enough.

Another measure of "efficiency" might be how easy it is to use. Here,
bytea fields are very nice: They act just like varchar fields, no
special functions necessary.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: pg_getnameinfo_all() failed: Temporary failure in name resolution

2023-11-26 Thread Peter J. Holzer

On 2023-11-24 16:39:57 -0500, Ron Johnson wrote:
> On Fri, Nov 24, 2023 at 4:26 PM David G. Johnston 
> wrote:
> 
> On Friday, November 24, 2023, Ron Johnson  wrote:
> 
> 
> The second "way" sounds interesting, but what is it filled with?

With "???"

(which isn't very helpful. I would have hoped it's filled in with a
representation of the IP address)



> What does it matter?  It’s an internal detail that apparently gets exposed
> as [unknown] appearing in your log file where the client ip address would
> normally be.
> 
> 
> Because I'd hope that the log file would tell me the errant host name.  

The error message says that the host name could not be determined
because of a DNS error. So obviously it can't tell you the host name.
It could tell you the IP address, though.


> I added "log_hostname = on" to postgresql.conf earlier in the day. 
> When I commented that out, the errors stopped happening.
> 
> 
> This makes sense.
> 
> 
> But is not explanatory.  I want to see host names in the log file when
> possible, not just IP addresses.

So now that you have IP addresses again, are there any for which a
reverse lookup doesn't work?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: replication primary writting infinite number of WAL files

2023-11-26 Thread Peter J. Holzer

On 2023-11-24 22:05:12 +0100, Les wrote:
> And there is nothing in the logs in that time period besides "checkpoints
> are happening too frequently"?
> 
> 
> 
> Here are all the logs we have from that minute.
> 
> ╰─# journalctl CONTAINER_ID=33d8b96b9062 --since '2023-11-24 10:18:00' --until
> '2023-11-24 10:21:00.000'                                  

[citing only the "checkpoints are occurring too frequently" entries:]
> nov 24 10:18:04 docker02 33d8b96b9062[1200]: 2023-11-24 10:18:04.517 CET [35] 
> > LOG:  checkpoints are occurring too frequently (22 seconds apart)
> nov 24 10:18:26 docker02 33d8b96b9062[1200]: 2023-11-24 10:18:26.776 CET [35] 
> > LOG:  checkpoints are occurring too frequently (22 seconds apart)
> nov 24 10:18:49 docker02 33d8b96b9062[1200]: 2023-11-24 10:18:49.002 CET [35] 
> > LOG:  checkpoints are occurring too frequently (22 seconds apart)
> nov 24 10:19:11 docker02 33d8b96b9062[1200]: 2023-11-24 10:19:11.320 CET [35] 
> > LOG:  checkpoints are occurring too frequently (23 seconds apart)
> nov 24 10:19:34 docker02 33d8b96b9062[1200]: 2023-11-24 10:19:34.058 CET [35] 
> > LOG:  checkpoints are occurring too frequently (23 seconds apart)
> nov 24 10:19:56 docker02 33d8b96b9062[1200]: 2023-11-24 10:19:56.160 CET [35] 
> > LOG:  checkpoints are occurring too frequently (22 seconds apart)
> nov 24 10:20:19 docker02 33d8b96b9062[1200]: 2023-11-24 10:20:19.691 CET [35] 
> > LOG:  checkpoints are occurring too frequently (23 seconds apart)
> nov 24 10:20:42 docker02 33d8b96b9062[1200]: 2023-11-24 10:20:42.938 CET [35] 
> > LOG:  checkpoints are occurring too frequently (23 seconds apart)

Interesting. If the database writes 1.5 GB/s of WALs and max_wal_size is
the default of 1GB, shouldn't there be a checkpoint about every 0.7
seconds instead of just every 22 seconds?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Can user specification of a column value be required when querying a view ?

2023-11-25 Thread Peter J. Holzer

On 2023-11-25 10:49:56 -0500, Ron Johnson wrote:
> On Sat, Nov 25, 2023 at 4:49 AM Peter J. Holzer  wrote:
> On 2023-11-24 13:06:45 -0500, Ron Johnson wrote:
> > On Fri, Nov 24, 2023 at 1:01 PM Peter J. Holzer  
> wrote:
> >     On 2023-11-20 22:03:06 -0500, Ron Johnson wrote:
> >     > Or row level security.
> >
> >     Does that help here?
[...]
> It's an alternative to functions for restricting the client to only his data.

Which isn't the problem here. So RLS doesn't help.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Can user specification of a column value be required when querying a view ?

2023-11-25 Thread Peter J. Holzer

On 2023-11-24 13:06:45 -0500, Ron Johnson wrote:
> On Fri, Nov 24, 2023 at 1:01 PM Peter J. Holzer  wrote:
> On 2023-11-20 22:03:06 -0500, Ron Johnson wrote:
> > Or row level security.
> 
> Does that help here? AIUI row level security can be used to limit access
> to specific rows (e.g. user alex can access info about ssn '106-91-9930'
> but not '234-56-7890') but not how many rows can be accessed in a single
> query.
> 
> 
> I don't think OP indicated that ssn in a unique key. 

No he didn't, but that's IMHO not relevant to the possibility of using
row level security. If a row level security allows a user to select a
row, that row can be selected by any query, including «select * from t».
I don't see a way to use RLS to ensure that a query can only return a
sufficiently small subset of the total rows a user has access to.
How would you do that?

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Can user specification of a column value be required when querying a view ?

2023-11-24 Thread Peter J. Holzer

On 2023-11-20 22:03:06 -0500, Ron Johnson wrote:
> Or row level security.

Does that help here? AIUI row level security can be used to limit access
to specific rows (e.g. user alex can access info about ssn '106-91-9930'
but not '234-56-7890') but not how many rows can be accessed in a single
query.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Postgres limitation in user management

2023-11-04 Thread Peter J. Holzer

On 2023-11-04 21:42:34 +, Brent Wood wrote:
> >> We have 2 sets of database user groups –
> >>
> >>  1. App – who owns the application schemas (and tables)
> >>  2. Support – who provides db support
> >>
> >> We want Support users to have no SELECT or DML privilege but only ALTER
> TABLE
> >> to perform any troubleshooting in the database.
> 
> >This seems strange to me. What kind of troubleshooting requires to
> >ability to ALTER TABLE but not to do DML?
> 
> Where your db admin & data admin are separated. Data security issues can
> require minimal access to data, which a dba does not necessarily require.
> Especially when the DBA role is contracted out. 
> 
> Sort of along this line, we have offloaded user management to AD, so our DB
> user management is now carried out via in-house IT, who are not DBA's and have
> no access to data.

This doesn't answer the question why ALTER TABLE privilege would be
required.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Postgres limitation in user management

2023-11-04 Thread Peter J. Holzer

On 2023-11-03 06:26:21 +, Kar, Swapnil (TR Technology) wrote:
> We have 2 sets of database user groups –
> 
>  1. App – who owns the application schemas (and tables)
>  2. Support – who provides db support
> 
> We want Support users to have no SELECT or DML privilege but only ALTER TABLE
> to perform any troubleshooting in the database.

This seems strange to me. What kind of troubleshooting requires to
ability to ALTER TABLE but not to do DML?

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: pg_checksums?

2023-10-30 Thread Peter J. Holzer

On 2023-10-29 13:26:27 -0500, Ron wrote:
> On 10/29/23 12:57, Paul Förster wrote:
> > Safe in the sense that, if I enable checksums on a replica, switch
> > over and the enable checksums on the other side, if this is ok, or
> > whether future mutations on the primary will corrupt the replica.
> 
> Trying it would tell you something.
> 
> > That's why I asked if I need to perform a patronictl reinit.
> 
> Best to ask Percona.

Why Percona?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: pg_checksums?

2023-10-30 Thread Peter J. Holzer

On 2023-10-30 09:56:31 +0900, Michael Paquier wrote:
> Hm?  Page checksums are written when a page is flushed to disk, we
> don't set them for dirty buffers or full-page writes included in WAL,
> so it should be OK to do something like the following:
> - Stop cleanly a standby.
> - Run pg_checksums on the standby to enable them.
> - Restart the standby.
> - Catchup with the latest changes
> - Stop cleanly the primary, letting the shutdown checkpoint be
> replicated to the standby.
> - Promote the standby.
> - Enable checksums on the previous primary.
> - Start the previous primary to be a standby of the node you failed
> over to.

I stand corrected.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: [SOLVED?] Re: Disk wait problem... not hardware...

2023-10-30 Thread Peter J. Holzer

On 2023-10-29 12:45:08 -0400, p...@pfortin.com wrote:
> On Sun, 29 Oct 2023 16:16:05 +0100 Peter J. Holzer wrote:
> >However, the table statistics contain an estimate for the number of
> >rows:
> >
> >hjp=> select schemaname, relname, n_live_tup from pg_stat_user_tables order 
> >by 3 desc;  
> >╔╤═══╤╗
> >║ schemaname │  relname  │ n_live_tup ║
> >╟┼───┼╢
> >║ public │ ncvhis_2016_12_03 │977 ║
> >║ public │ random_test   │100 ║
> >║ public │ beislindex│351 ║
> >║ public │ delivery  │  6 ║
> >...
> 
> Are there prerequisites for this query?
> Every (264) n_live_tup returned was zero...  though, VERY fast.. :)

You need to ANALYZE the tables. Autovacuum should do this automatically
when it detects that a table has changed "enough" since the last
analyze, but for slowly growing tables that can sometimes take a long
time (indeed, I did an "analyze beislindex" just before the query above
because it showed only 25 tuples and I knew that wasn't right).

A count of 0 for a large table is weird, though.
Unless ... did you do a major version upgrade recently? That nukes the
statistics and you have to analyze everything again. Also, I'm not sure
if pg_restore triggers an analyze.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: pg_checksums?

2023-10-29 Thread Peter J. Holzer

On 2023-10-29 16:15:37 +0100, Paul Förster wrote:
> On Oct 29, 2023, at 11:49, Peter J. Holzer  wrote:
> > It *might* work if there are zero writes on the primary during the
> > downtime of the replica (because those writes couldn't be replicated),
> > but that seems hard to ensure. Even if you could get away with making
> > the primary read-only (is this even possible?) I wouldn't have much
> > confidence in the result and reinit the (new) replica anyway.
> 
> As soon as I stop the replica to enable checksums, even writes can't
> get replicated anymore. So during enabling checksums, a replica is
> definitely protected against modifications by its primary, simply
> because it's down. The modifications of the primary are applied to the
> replica when it comes back online.

And this is where it would break down. The modifications can't be
applied to the replica any more because the replica now contains
checksums and the modifications don't. In the best case the replica
would catch the discrepancy and refuse to apply the modifications which
would lead to the loss of these modifications. In the worst case it
would apply them anyway causing severe data corruption.

> So, I don't see a problem at this particular stage.
> 
> My fear is merely that enabling checksums does something to the
> physical state of the data files which are not compatible with the
> other side.

Exactly. Those checksums have to be stored somewhere.

> Like for example manipulate the file headers in some way.

Not just the file headers. Every single data block.

(Ok, it looks like the space for the checksum is reserved even if
checksums aren't enabled[1]. So at least pg_checksums doesn't have to
move data around to enable them. But overwriting a page with a checksum
with one without one would still be bad.)

hp

[1] 
https://www.postgresql.org/docs/current/storage-page-layout.html#PAGEHEADERDATA-TABLE

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: [SOLVED?] Re: Disk wait problem... not hardware...

2023-10-29 Thread Peter J. Holzer

On 2023-10-29 09:21:46 -0400, p...@pfortin.com wrote:
> These are all static tables. Does PG maintain a table row count so as to
> avoid having to count each time?

No. To count the rows in a table, Postgres has to actually read the
whole table (or an index, if a suitable index (e.g. a primary key)
exists).

However, the table statistics contain an estimate for the number of
rows:

hjp=> select schemaname, relname, n_live_tup from pg_stat_user_tables order by 
3 desc;
╔╤═══╤╗
║ schemaname │  relname  │ n_live_tup ║
╟┼───┼╢
║ public │ ncvhis_2016_12_03 │977 ║
║ public │ random_test   │100 ║
║ public │ beislindex│351 ║
║ public │ delivery  │  6 ║
...

This is an estimate, not the actual count, so it might be wrong
(sometimes very wrong), but accessing it is fast and it might be good
enough for your purposes.

> WB is setup to:
> * autoload table row count
> * autoload table data (restricted with LIMIT)

Maybe WB can be set up to get n_live_tup instead of the real row count?

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Disk wait problem... may not be hardware...

2023-10-29 Thread Peter J. Holzer

On 2023-10-27 19:46:09 -0400, p...@pfortin.com wrote:
> On Fri, 27 Oct 2023 19:07:11 +0200 Peter J. Holzer wrote:
> >Have you looked at the query plans as I recommended? (You might also
> >want to enable track_io_timing to get extra information, but comparing
> >just the query plans of fast and slow queries would be a first step)
> 
> I didn't see how that would help since other than the table name the
> queries are identical.  Curious: are you implying PG stores tables
> differently?

No, but Postgres decides on the query depending on the statistics stored
about that table. If those statistics are off, the query plan can be
wildly different and very inefficient. So checking whether the plans are
plausible should be one of the first things you do when performance is
not what you expect. Indeed, on
https://wiki.postgresql.org/wiki/Slow_Query_Questions it is the very
first bullet point in the section "Things to Try Before You Post".

When you have established that the plan looks fine even when the
performance is poor then you have to look elsewhere. But even then it
helps to know what the database is actually trying to do.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: pg_checksums?

2023-10-29 Thread Peter J. Holzer

On 2023-10-29 10:11:07 +0100, Paul Förster wrote:
> On Oct 29, 2023, at 02:43, Peter J. Holzer  wrote:
> > I don't think so. AFAIK Replication keeps the data files in sync on a
> > bit-for-bit level and turning on checksums changes the data layout.
> > Running a cluster where one node has checksums and the other doesn't
> > would result in a complete mess.
> 
> I agree with the last sentence. This is why I asked if it is safe to
> enable checksums on a replica, switch over and then do it again on the
> ex primary, i.e. now new replica without doing a reinit.

It *might* work if there are zero writes on the primary during the
downtime of the replica (because those writes couldn't be replicated),
but that seems hard to ensure. Even if you could get away with making
the primary read-only (is this even possible?) I wouldn't have much
confidence in the result and reinit the (new) replica anyway.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: pg_checksums?

2023-10-28 Thread Peter J. Holzer

On 2023-10-27 23:37:24 +0200, Paul Förster wrote:
> But unfortunately still, my questions c) and d) are unanswered. I'd
> especially be interested in an answer to c), i.e. is it *safe* to
> "pg_checksum -e" the replica instance in a patroni cluster, switch
> over, and then do the other one?

I don't think so. AFAIK Replication keeps the data files in sync on a
bit-for-bit level and turning on checksums changes the data layout.
Running a cluster where one node has checksums and the other doesn't
would result in a complete mess.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Disk wait problem... may not be hardware...

2023-10-27 Thread Peter J. Holzer

On 2023-10-26 22:03:25 -0400, p...@pfortin.com wrote:
> Are there any extra PG low level logs that can be turned on?

Have you looked at the query plans as I recommended? (You might also
want to enable track_io_timing to get extra information, but comparing
just the query plans of fast and slow queries would be a first step)

You haven't shown any postgresql logs or your settings, so it's hard to
know what you have already turned on.

There are a ton of logging related parameters.

> I've only found logging in the journal;

By "the journal" you mean the one maintained by journald? (Did you
mention what OS you are using? From several outputs I'm guessing it is
Linux). On Linux systems postgresql is usually set up to log into files
somewhere under /var/log/postgresql (see the log_directory entry in
postgresql.conf) but your setup might be different, of course.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Very newbie question

2023-10-26 Thread Peter J. Holzer

On 2023-10-26 11:56:56 +0200, Olivier Gautherot wrote:
> El jue, 26 oct 2023 11:15, Peter J. Holzer  escribió:
> On 2023-10-25 17:48:46 +0200, Olivier Gautherot wrote:
> > El mié, 25 oct 2023 16:58, Олег Самойлов  escribió:
> >     Okey, I see no one was be able to solve this problem. But I could.
> > May be
> >     for someone this will be useful too. There is solution.
> [...]
> >     Now query is:
> >
> >     SELECT generate_series(min(id)/1000, max(id)/1000) AS n FROM
> >     delivery) as part_numbers
> >              WHERE (SELECT max(created_at) from delivery where 
> n*1000
> <=id
> >     and id < (n+1)*1000)
> >                 < CURRENT_DATE-'3 month'::interval;

I just realized that this query is mangled. I'm going to assume that it
should have been something like 

with part_numbers as (
SELECT generate_series(min(id)/100, max(id)/100) as n
from delivery
)
select * from part_numbers
WHERE (SELECT max(created_at) from delivery where n*100 <= id and id < 
(n+1)*100) < CURRENT_DATE-'3 month'::interval;

> [...]
> > Your fast solution will work as long as you don't have missing sequences
> (like
> > deleted rows).
> 
> Why do you think this would break with missing sequence numbers?
> 
> 
> In the suggested query, the return value contains a list of sequential numbers
> from a min to a max - they seem to be markers of the partitions. Let's assume
> that a complete partition is deleted in the middle: its index will still be
> returned by the query, although it doesn't exist any more in the table.

I don't think it will. While the generate_series() will produce the
partition number, the where clause will not find any matching rows, so
the query will not return it.

E.g. (this table isn't partitioned, but that shouldn't affect the
result, also I'll reduce the "partition size" to 100 to make it more
readable):

create table delivery (id int, created_at date);
insert into delivery(200, '2000-01-01');
insert into delivery values(200, '2000-01-01');
insert into delivery values(299, '2000-12-01');
insert into delivery values(412, '2002-02-01');
insert into delivery values(439, '2002-03-01');
insert into delivery values(501, '2023-01-01');
insert into delivery values(555, now());

Note that there are no records in "partition" 3, and "partition" 5
contains current data, so we should get only "partition numbers" 2 and
4:

with part_numbers as (
SELECT generate_series(min(id)/100, max(id)/100) as n
from delivery
)
select * from part_numbers
WHERE (SELECT max(created_at) from delivery where n*100 <= id and id < 
(n+1)*100) < CURRENT_DATE-'3 month'::interval;

╔═══╗
║ n ║
╟───╢
║ 2 ║
║ 4 ║
╚═══╝
(2 rows)

Looks ok to me.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Disk wait problem...

2023-10-26 Thread Peter J. Holzer

On 2023-10-23 16:31:30 -0700, Adrian Klaver wrote:
> On 10/23/23 14:55, p...@pfortin.com wrote:
> > Next, I made a query list for 30 tables, turned on timing and pasted the
> > queries.  Pasting results inline:
> > ncsbe=# select count(*) from ncvhis_2012_12_06;
> > select count(*) from ncvhis_2016_12_03;
> > select count(*) from ncvhis_2020_08_01;
> > select count(*) from ncvhis_2020_10_31;
[...]
> > Time: 72404.786 ms (01:12.405)
> > Time: 301.246 ms
> > Time: 409.974 ms
> > Time: 363.836 ms
> > Time: 351.167 ms
> > Time: 348.378 ms
> > Time: 348.712 ms
> > Time: 351.901 ms
> > Time: 487.837 ms
> > Time: 40987.826 ms (00:40.988)
> > Time: 76964.281 ms (01:16.964)
> > Time: 483.329 ms
> > Time: 18919.267 ms (00:18.919)
> > Time: 50896.978 ms (00:50.897)
> > Time: 25784.000 ms (00:25.784)
> > Time: 45407.599 ms (00:45.408)
> > Time: 75171.609 ms (01:15.172)
> > Time: 88871.004 ms (01:28.871)
> > Time: 128565.127 ms (02:08.565)
> > Time: 210248.222 ms (03:30.248)
> > Time: 246609.561 ms (04:06.610)   4 minutes!  ;p
> > Time: 166406.283 ms (02:46.406)
> > Time: 291483.538 ms (04:51.484)  nearly 5 minutes ;p ;p
> > Time: 2269.961 ms (00:02.270)
> > Time: 413.219 ms
> > Time: 433.817 ms
> > Time: 395.844 ms
> > Time: 7251.637 ms (00:07.252)
> > Time: 384.328 ms
> > Time: 384.887 ms
> > 
> > I don't see a pattern in the above; matches the randomness I saw using
> > WB...

It would be interesting to see whether there's a difference in plan
between the slow and the fast queries.

If your experiment above is repeatable, you can just prefix every
query with 
explain (analyze, buffers)


explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
explain (analyze, buffers) select count(*) from ncvhis_2020_08_01;
explain (analyze, buffers) select count(*) from ncvhis_2020_10_31;
...

If it is hard to reproduce (happens only sometimes), the best way to
catch is probably to enable auto_explain
<https://www.postgresql.org/docs/current/auto-explain.html>, possibly
with the auto_explain.log_analyze option and check the logs after it
happened. (auto_explain is generally a good idea, IMHO).


> > Before the above finished, I issued this command on another konsole...
> > 
> > $ while true; do ls -l > /tmp/ll; date; done

This is unlikely to generate noticeable disk waits. The current
directory will be in the cache after the first ls and the writes happen
asynchroneously.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Very newbie question

2023-10-26 Thread Peter J. Holzer

On 2023-10-25 17:48:46 +0200, Olivier Gautherot wrote:
> El mié, 25 oct 2023 16:58, Олег Самойлов  escribió:
> Okey, I see no one was be able to solve this problem. But I could. May be
> for someone this will be useful too. There is solution.
[...]
> Now query is:
> 
> SELECT generate_series(min(id)/1000, max(id)/1000) AS n FROM
> delivery) as part_numbers
>          WHERE (SELECT max(created_at) from delivery where n*1000 <=id
> and id < (n+1)*1000)
>             < CURRENT_DATE-'3 month'::interval;
> 
> Return the same (number of partition need to archive), accelerated by two
> btree index: on id and created_at. Works very quick, less then second.
[...]
> Your fast solution will work as long as you don't have missing sequences (like
> deleted rows).

Why do you think this would break with missing sequence numbers?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How can I accurately infer the output of a SELECT statement FROM a table with floating-point data type?

2023-10-21 Thread Peter J. Holzer

On 2023-10-19 00:12:51 -0400, Tom Lane wrote:
> =?GBK?B?w6vDqw==?=  writes:
> > Do we have a rule by follow which one can accurately info the output of a 
> > SELECT statment FROM a table with floating-point data type?
> 
> The goal of our floating-point output functions is to produce
> the shortest representation from which the stored value could
> be recovered exactly.  It's not unusual for that to require
> two or three decimal digits more than the nominal precision.

To illustrate this:

A real has 24 bits of mantissa. 12345.123456789 is between 2**13 and
2**14, so there are 10 bits left for the fraction. Or in other words,
the number must be approximated as a multiple of 1/1024.

The closest we can get is 12345+124/1024 = 12345.123046875

12345.123 would obviously be rounded to the same number, so it's close
enough and additional digits aren't necessary.

But 12345.12 would be rounded to 12345+123/1024 = 12345.1201171875.
That's different, so 7 digits are not enough in this case.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Multiple inserts with two levels of foreign keys

2023-10-08 Thread Peter J. Holzer

On 2023-10-05 09:59:24 -0500, Ron wrote:
> But honestly, the amount of text duplication hurts my "inner programmer". 
> And it would have to be generated dynamically, since you don't know how many
> crops were delivered.  #shudder

Yes, this seems like the kind of problem that I would definitely solve
in a script running outside of the database. Especially since it has to
talk to two databases. If the number of data records isn't too large
(maybe a few tens of thousands), I'd just write three loops to select
from the prod database and insert into the dev database.

If the number of records is too large for that, I'd create some staging
table with an extra column "new_id" filled from the same sequence as the
original table, like this:

create table new_farms(
id bigint,
name character varying(30),
new_id bigint default nextval('farms_id_seq')
)

Then you can just COPY the data into these tables and it will give a
nice mapping from old to new ids which you can use in subsequent
inserts.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Peer authentication failed ???

2023-10-03 Thread Peter J. Holzer

On 2023-10-03 05:55:51 -0400, Amn Ojee Uw wrote:
> psql -U my_group_worker -d my_group_db
> 
> psql: error: connection to server on socket 
> "/var/run/postgresql/.s.PGSQL.5432"
> failed: FATAL:  Peer authentication failed for user "my_group_worker"
> 
> *** What am I doing wrong?

PostgreSQL uses the first matching rule from pg_hba.conf. Presumably
(because that's the default on Debian/Ubuntu) you have it set up to use
peer authentication on the unix socket and password authentication
(scram or md5) on everything else.

You are connecting via the Unix socket (/var/run/postgresql/.s.PGSQL.5432),
so the server will attempt peer authentication and nothing else. To get
it to attempt password authentication connect via a tcp socket:

psql -U my_group_worker -h localhost -d my_group_db

(More conveniently set up ident authentication for your users, then you
don't need a password.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Ensuring Rifferential Integrity

2023-09-17 Thread Peter J. Holzer

On 2023-09-17 19:24:52 +0200, Anthony Apollis wrote:
>  I brought in the Primary/Secondary/Foreign keys because it does not exist in
> the Fact/Dimension tables.
> 
> The Fact tables contain 6 million records and the dimension tables are tiny.
> Because some columns don't exist in the Fact and Dimension table I can not
> update the Foreign Keys in the Fact table to ensure relationship integrity.
> 
> e.g Say I have a Fact table containing Apple's Sales; one of the Dimension
> tables is Apple Type. Since the two tables don't contain an Apple Type column
> in both I won't be able to enforce referention integrity. If my Apple Sales
> table contains 6 million + sales, I won't be able to break it down Apple sales
> by Type.

Can you illustrate this with a simple example? I don't think I
understood what you're trying to say.

> That is the problem I am sitting with. My fact Table is not able to give me
> unique Foreign Key columns. I read about a Mapping table.

Foreign key columns aren't normally supposed to be unique. You want to
reference the same thing (e.g. your apple type) from many columns (the
same type of apple will be sold in many stores every day).


> ” UPDATE system."IMETA_ZTRB_MP$F_ZTBR_TA_BW" AS A
> 
> SET "Master_BRACS_Secondary_Key" = B."Primary_ZTBR_TransactionCode"
> 
> FROM dim."IMETA_Master_BRACS_to_SAP_Data_TA_BR_" AS B
> 
> WHERE A."ZTBR_TransactionCode" = B."Primary_ZTBR_TransactionCode";”

Isn't that basically the same as 

UPDATE system."IMETA_ZTRB_MP$F_ZTBR_TA_BW"
SET "Master_BRACS_Secondary_Key" = "ZTBR_TransactionCode";

?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Is data import from Oracle table to Postgres table in Ora2pg consecutive or random?

2023-09-12 Thread Peter J. Holzer

On 2023-09-08 17:19:01 +0700, duc hiep ha wrote:
> I want to migrate a large Oracle table to PostgreSQL, which is approximately
> 200GB in size and includes a LOB segment. This table has a primary key column
> named "ID," which increments by one unit (similar to a sequence). During the
> migration of this large table, sometimes the process terminates prematurely
> (due to network issues, etc.). As a result, I have to restart the import
> process either from the beginning or by using a WHERE clause in the ora2pg
> configuration file to import the remaining data from where it was previously
> terminated.
> 
> However, I've noticed that the import process in ora2pg doesn't seem to follow
> a consecutive order, starting from a small ID number and progressing to a
> larger ID number. Instead, it appears to import data randomly from various 
> rows
> of the table. This makes it impossible to continue importing data from the
> point where it was terminated using a WHERE clause in ora2pg.
> 
> My questions are:
> 
> - Is it correct that ora2pg imports data from the Oracle table to the
> PostgreSQL table randomly and not sequentially from a smaller ID to a larger
> ID?

Oracle doesn't in general order tables by primary key (it does offer
"index organized tables", but the default is "heap organized"), so a
simple "select * from tablename" will return rows in semi-random order.

I think the rowid is in ascending order (but I can't test that at the
moment) so you may be able to use the rowid in your where clause.

> - Or can we add additional parameters to the ora2pg.conf file to control this
> process and ensure that the data is imported sequentially, following the
> primary key from smallest to largest?

I can't answer that.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Upgrade problem

2023-09-12 Thread Peter J. Holzer

On 2023-09-11 17:04:54 +0100, Graeme wrote:
> Preparing to upgrade my small cluster from Mageia 8/Pg 9 to Mageia 9/Pg 15. 
> I'm
> at the point of running pg_upgrade but have received anerror message:
> 
> /mga8/usr/bin/postgres: error while loading shared libraries: libssl.so.1.1:
> cannot open shared object file: No such file or directory
> no data was returned by command ""/mga8/usr/bin/postgres" -V"
> 
> However:
> 
> [root@bach lib64]# cd /mga8/usr/lib64

I'm not familiar with the Mageia update process, but the paths look
strange, Is it normal that old binaries and libraries are moved into a
directory named after the old version (I assume "mga8" is short for
Mageia version 8") or is this something you have done?

In any case, /mga8/usr/lib64 would not normally be on the library
search path, Have you somehow told /mga8/usr/bin/postgres to look there?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

2023-09-09 Thread Peter J. Holzer

On 2023-09-06 22:39:42 +0530, Sai Teja wrote:
> This column has many sentences for each row in which some of the
> characters are in Greek language.
> Like µ, ë, ä, Ä etc..

Except for the µ[1], those ̣aren't Greek letters.

hp

[1] and even that's debatable, since it's a micro sign, not a mu.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: postgreSQL UPPER Method is converting the character "µ" into "M"

2023-09-09 Thread Peter J. Holzer

On 2023-09-06 17:29:21 +0200, Francisco Olarte wrote:
> On Wed, 6 Sept 2023 at 16:40, Sai Teja  
> wrote:
> > Actually we are migrating the data from DB2 to postgreSQL. So in DB2
> > the upper method is converting µ as µ only but not as M.
> > So, while validating the data we have encountered this problem.
[...]
> Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
> like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
> which looks like small mu ) while windows-1252 only has 0xB6 as micro.
> OTOH Windows-1253 ( greek ansi ) has all three.
> 
> If your small mu are really micro-sign ( which is suspected if youused
> 1252 ) maybe changing them to that helps ( but I do not have the
> resources to test that on hand ).

Nope, it doesn't:

hjp=> select chr(181) as micro, chr(956) as mu;
╔═══╤╗
║ micro │ mu ║
╟───┼╢
║ µ │ μ  ║
╚═══╧╝
(1 row)

hjp=> select upper(chr(181)) as micro, upper(chr(956)) as mu;
╔═══╤╗
║ micro │ mu ║
╟───┼╢
║ Μ │ Μ  ║
╚═══╧╝
(1 row)


(At least not on an Ubuntu 22.04 system using the en_US.UTF-8 locale).

IMHO uppercasing MICRO SIGN doesn't make much sense, but that was the
decision that either the libc maintainers ore the Unicode committee
made.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: PSQL = Yes ... JDBC = no ??

2023-09-04 Thread Peter J. Holzer

On 2023-09-03 20:19:44 -0400, Amn Ojee Uw wrote:
> I see now, any suggestions as work around in JDBC?

Implement the functionality in Java.

You could do it directly:

execute
SELECT 'CREATE DATABASE jme_test_database' WHERE NOT EXISTS (SELECT FROM 
pg_database WHERE datname = 'jme_test_database')
read the result
for each row returned
execute it as SQL

But that would be silly.

Instead you would do it like this:

execute
SELECT datname FROM pg_database WHERE datname = 'jme_test_database'
read the result
if the result is empty:
execute
CREATE DATABASE jme_test_database'

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Restoring default privileges on objects

2023-08-30 Thread Peter J. Holzer

On 2023-08-29 14:44:48 -0600, Stuart McGraw wrote:
> On 8/29/23 13:27, Tom Lane wrote:
> > Fixing \dp to honor "\pset null" for this might be a reasonable
> > thing to do too.  I'm actually a bit surprised that that doesn't
> > work already.
> 
> That change would still require someone using \dp to realize that
> the "Access privileges" value could be either '' or NULL (I guess
> that could be pointed out more obviously in the psql doc), and then
> do a '\pset null' before doing \dp?  That seems a little inconvenient.

Or just always do a \pset null. For me printing NULL the same as an
empty string is just as confusing in normal tables, so that's the first
line in my ~/.psqlrc. YMMV, of course.

But I guess the point is that people who do \pset null expect to be able
to distinguish '' and NULL visually and might be surprised if that
doesn't work everywhere, while people who don't \pset null know that ''
and NULL are visually indistinguishable and that they may need some
other way to distinguish them if the difference matters.

So +1 for me fixing \dp to honor "\pset null".

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: The same prepared query yield "-1" the first six times and then "-1.0"

2023-08-21 Thread Peter J. Holzer

On 2023-08-21 17:31:06 -0400, Dave Cramer wrote:
> On Mon, 21 Aug 2023 at 17:17, Ron  wrote:
> 
> On 8/21/23 14:32, Dave Cramer wrote:
> 
> [snip]
> 
> It has to do with the way the data is being transferred. When the
> driver switches to a named statement it also switches to binary mode
> which means data will be transferred in binary. 
> 
> In text we get -1, in binary we get -1.0
> 
> 
> That seems odd.  Why does it do that?
[...]
> whereas float8send sends the actual binary data on disk, hence -1.0

It should be noted that the binary representation is not "-1.0", but
something like 
1 0111 000 (in binary or)
BF   80   0   0   0   0(in hex)
(big endian for better readability)

The decimal representation "-1.0" is created by the binary to string
conversion (Java's Float.toString()?). That could also produce "-1" or
"-1E0" or any other equivalent representation. The author of that
routine decided in include ".0" in the output, possibly to signify that
it's a floating point value, not an integer.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: The same prepared query yield "-1" the first six times and then "-1.0"

2023-08-21 Thread Peter J. Holzer

On 2023-08-21 17:27:20 +0200, Edoardo Panfili wrote:
> The attended result was a sequence of ten equal values but this is the actual 
> result:
> p: -1
> p: -1
> p: -1
> p: -1
> p: -1
> p: -1.0
> p: -1.0
> p: -1.0
> p: -1.0
> p: -1.0
[...]
> 2023-08-21 11:51:50.633 CEST [1511] user@testdb LOG: execute : SET 
> extra_float_digits = 3
> 2023-08-21 11:51:50.634 CEST [1511] user@testdb LOG: execute : SET 
> application_name = 'PostgreSQL JDBC Driver'
> 2023-08-21 11:51:50.644 CEST [1511] user@testdb LOG: execute : 
> SELECT dim1 FROM number WHERE name='first'
> 2023-08-21 11:51:50.648 CEST [1511] user@testdb LOG: execute : 
> SELECT dim1 FROM number WHERE name='first'
> 2023-08-21 11:51:50.649 CEST [1511] user@testdb LOG: execute : 
> SELECT dim1 FROM number WHERE name='first'
> 2023-08-21 11:51:50.650 CEST [1511] user@testdb LOG: execute : 
> SELECT dim1 FROM number WHERE name='first'
> 2023-08-21 11:51:50.651 CEST [1511] user@testdb LOG: execute S_1: SELECT dim1 
> FROM number WHERE name='first'
> 2023-08-21 11:51:50.651 CEST [1511] user@testdb LOG: execute S_1: SELECT dim1 
> FROM number WHERE name='first'
> 2023-08-21 11:51:50.653 CEST [1511] user@testdb LOG: execute S_1: SELECT dim1 
> FROM number WHERE name='first'
> 2023-08-21 11:51:50.653 CEST [1511] user@testdb LOG: execute S_1: SELECT dim1 
> FROM number WHERE name='first'
> 2023-08-21 11:51:50.654 CEST [1511] user@testdb LOG: execute S_1: SELECT dim1 
> FROM number WHERE name='first'
> 2023-08-21 11:51:50.656 CEST [1511] user@testdb LOG: execute S_1: SELECT dim1 
> FROM number WHERE name=‘first'

Are these outputs from the same run?

I notice that the output from the program switches after 5 queries from
"-1" to "-1-0", but the logged query name switches after 4 queries from
"" to "S_1".

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Loops and Case Statements Involving Dates

2023-08-21 Thread Peter J. Holzer

On 2023-08-21 09:07:00 +0200, Anthony Apollis wrote:
> Please review my code and make recommendations where needed. I have this code:
> ```
[complicated code snipped]
> 
> 
> I have re-written this code to make it less complex, still doing what
> it is supposed to do. I want you to review my re-written code(code
> must give me report up until yesterday):
> ```
> WITH MaxDateCTE AS (
> SELECT
> COALESCE(MAX(DISTINCT "CALDAY"), '2021-07-01') AS MaxDate
> FROM
> "system"."IMETA_ZTRNSPCST$F_Shipment_Cost_TA_BW"
> )
> 
> SELECT
> CASE
> WHEN EXTRACT(DAY FROM (MaxDate + INTERVAL '1 day')::timestamp - 
> (NOW() - INTERVAL '1 day')::timestamp) <= 30 THEN 1
> ELSE CEIL(EXTRACT(DAY FROM (MaxDate + INTERVAL '1 day')::timestamp - 
> (NOW() - INTERVAL '1 day')::timestamp) / 30)
> END AS "Number of days"
> FROM
> MaxDateCTE;

So the intent is to compute how many "months" the maximum CALDAY is in
the future, with some minor twists:
* A "month" is always 30 days, not a calendar month.
* The difference is between the day after the given date and yesterday -
  so it's shifted by one day (today + 30 days already counts as 2
  months)
* the minimum is 1.
?

Then this can be simplified further:

* MAX(DISTINCT "CALDAY") can be simplified to just MAX("CALDAY").

* The CASE can be eliminated and replaced by
    GREATEST(CEIL(EXTRACT(DAY FROM (MaxDate + INTERVAL '1 day')::timestamp - 
(NOW() - INTERVAL '1 day')::timestamp) / 30), 1)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: PG minor version in data directory?

2023-08-19 Thread Peter J. Holzer

On 2023-08-19 14:22:25 +1000, Justin Clift wrote:
> On 2023-08-19 14:10, David G. Johnston wrote:
> > On Fri, Aug 18, 2023 at 6:36 PM Justin Clift 
> > wrote:
> > 
> > > If the minor version is already tracked somewhere as well, that
> > > would be extremely useful for my use case.
> > > 
> > > 
> > The data directory doesn't have a concept of "minor version".  Only the
> > installed libraries and binaries do.
> 
> Thanks, that's what I figured.  I'll have to keep state in a
> PG_VERSION_MINOR
> there or something. :)

Wouldn't it be better to ask whatever system you use to install the
software for the version?

E.g. on Debian derived Linuux distributions:

% dpkg -l postgresql-14
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name   Version   Architecture Description
+++-==-=--=
ii  postgresql-14  14.9-0ubuntu0.22.04.1 amd64The World's Most Advanced 
Open Source Relational Database

or

% apt policy postgresql-14
postgresql-14:
  Installed: 14.9-0ubuntu0.22.04.1
  Candidate: 14.9-0ubuntu0.22.04.1
  Version table:
 *** 14.9-0ubuntu0.22.04.1 500
500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
500 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
100 /var/lib/dpkg/status
 14.2-1ubuntu1 500
500 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages

If you compile the software yourself, you could always ask postgres:

% /usr/lib/postgresql/14/bin/postgres --version
postgres (PostgreSQL) 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Converting sql anywhere to postgres

2023-08-15 Thread Peter J. Holzer

On 2023-08-15 18:43:11 +0200, Georg H. wrote:
> update mytest.autovalues set product='pear', updated_by=DEFAULT where key=2;
 
Oh, I didn't know that worked. Obvious in hindsight, of course (I've
been using DEFAULT in VALUES(...) for ages) but I never thought of it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to set default privilege for new users to have no access to other databases?

2023-08-12 Thread Peter J. Holzer

On 2023-08-09 14:35:40 -0400, Erik Nelson wrote:
> I have a lab with a database that I would like to use as a "multi-tenant"
> database, in that I would like to create a database for each of the
> applications that I'm running and segregate access so that user foo and user
> bar cannot see anything about their neighbors. I'm somewhat surprised to
> discover that any new user, by default, has the ability to list databases,
> connect to them, and list their tables.
> 
> My understanding is that this ability is inherited from the public role (could
> use confirmation of this)? I can think of two potential options, one being 
> more
> desirable:
> 
>   • I know I can revoke CONNECT from an explicit database, but this requires
> that I specify the database. I want to revoke this for all current, and
> future databases as the default privilege.

New databases are created as copies of a template database (template1 by
default). You can either alter template1 to your liking or create a new
template database and use that for creating your new databases (the
latter is especially useful if you need several different templates).

You could also use pg_hba.conf to restrict or grant access to specific
databases. This would probably mean that you would have to add a line to
pg_hba.conf each time you create a database.

And of course if you use the same database schema for several
applications you probably already have a script to set up a database.
Adding one or more REVOKE and/or GRANT statements to such a script would
seem to be a rather obvious way to do it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Upgrading

2023-07-30 Thread Peter J. Holzer

On 2023-07-30 07:53:54 -0400, Amn Ojee Uw wrote:
> On my Debian 12, I installed PostgreSQL-12,

Where did you install that from?

AFAICS, Debian 12 comes with PostgreSQL 15.

> I'd like to upgrade to the latest release of PostgreSQL. So, my
> question is, what is the best way to upgrade to the next release of
> PostgreSQL?

If you stay with the same source, Just installing the new version and
then invoking pg_upgrade (or a variant - PGDG, Debian, Ubuntu have
pg_upgradecluster) should do the trick.

If you switch sources, the setup may be sufficiently different that
pg_dump/pg_restore may be the easiest way.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: How to improve the performance of my SQL query？

2023-07-29 Thread Peter J. Holzer

On 2023-07-26 15:46:16 +0800, gzh wrote:
> SET enable_seqscan TO off;
[...]
> ->  Parallel Bitmap Heap Scan on tbl_sha  
> (cost=92112.45..2663789.14 rows=800650 width=18) (actual 
> time=260.540..21442.169 rows=804500 loops=3)
>   Recheck Cond: (ms_cd = 'MLD009'::bpchar)
>   Rows Removed by Index Recheck: 49
>   Filter: (etrys = '0001'::bpchar)
>   Rows Removed by Filter: 295500
>   Heap Blocks: exact=13788 lossy=10565
>   ->  Bitmap Index Scan on index_search_04_mscd_cdate 
>  (cost=0.00..91632.06 rows=3402599 width=0) (actual time=249.718..249.718 
> rows=330 loops=1)
> Index Cond: (ms_cd = 'MLD009'::bpchar)

So now it's using index_search_04_mscd_cdate which contains only ms_cd
(and - judging from the name, other fields not relevant to this query),
but it still doesn't use index_search_01 which would fit the query
exactly. I can understand that Postgres prefers a sequential scan over
an index scan (the number of matching rows is about 10% of the total
table size which is a lot), but why would it prefer a less specific
index to a more specific one?

Can you get Postgres to use that index at all?

Find a combination of ms_cd and etrys which doesn't cover millions of
rows and try that.

Also try lowering random_page_cost.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Effects of dropping a large table

2023-07-23 Thread Peter J. Holzer

On 2023-07-23 06:09:03 -0400, Gus Spier wrote:
> Ah! Truncating a table does not entail all of WAL processes. From the
> documentation, "TRUNCATE quickly removes all rows from a set of tables. It has
> the same effect as an unqualified DELETE on each table, but since it does not
> actually scan the tables it is faster. Furthermore, it reclaims disk space
> immediately, rather than requiring a subsequent VACUUM operation. This is most
> useful on large tables." https://www.postgresql.org/docs/14/sql-truncate.html

I assumed that by "deleting the now empty table" you meant DROPing it.
(Performing a «DELETE FROM t» just after a «TRUNCATE t» would obviously
be pointless).

So let me rephrase the question:

What's the advantage of 

TRUNCATE t
DROP t

over just

    DROP t

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Effects of dropping a large table

2023-07-23 Thread Peter J. Holzer

On 2023-07-22 16:37:39 -0400, Gus Spier wrote:
> Isn’t this a perfect opportunity to use the TRUNCATE command to
> quickly remove the data? And follow up by deleting the now empty
> tables?

What's the advantage of first truncating and then deleting a table over
just deleting it?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to improve the performance of my SQL query？

2023-07-20 Thread Peter J. Holzer

On 2023-07-20 15:09:22 +0800, gzh wrote:
> Hi everyone,
> 
> 
> I'm running into some performance issues with my SQL query.
> 
> The following SQL query is taking a long time to execute.
> 
> 
> explain analyze
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE TBL_SHA.MS_CD = '009'
> and TBL_SHA.ETRYS in
>(select TBL_INF.RY_CD
> from TBL_INF
> WHERE TBL_INF.MS_CD = '009'
>AND TBL_INF.RY_CD = '01'
>)
> 
> - Execution Plan -
[...]
>   ->  Seq Scan on TBL_SHA  (cost=0.00..2698666.58 rows=3202774 
> width=18)
>(actual time=97264.138..123554.792 
> rows=320 loops=1)
> Filter: ((MS_CD = '009'::bpchar) AND (ETRYS = 
> '01'::bpchar))
> Rows Removed by Filter: 32000325
[...]
> 
> 
> 
> The index is defined as follows.
> 
> 
> CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS);
> 
> 
> 
> When I take the following sql statement, the index works fine and the query is
> fast.
> 
> 
> select COUNT(ET_CD)
> from TBL_SHA
> WHERE MS_CD = '009'
> AND ETRYS = '01'

What's the plan for that query?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Fwd: error in the example given for numeric data types

2023-07-15 Thread Peter J. Holzer

On 2023-07-15 12:08:26 +0530, Priyank Rajvansh wrote:
> Page: https://www.postgresql.org/docs/15/datatype-numeric.html
> Description:
> 
> article 8.1.2 states the following:
> 
> ''We use the following terms below: The precision of a numeric is the total
> count of significant digits in the whole number, that is, the number of
> digits to both sides of the decimal point. The scale of a numeric is the
> count of decimal digits in the fractional part, to the right of the decimal
> point. So the number 23.5141 has a precision of 6 and a scale of 4. Integers
> can be considered to have a scale of zero.''
> 
> however it also states the following towards the end:
> 
> '' For example, a column declared as
> 
> NUMERIC(3, 5)
> will round values to 5 decimal places and can store values between -0.00999
> and 0.00999, inclusive.''
> 
> Now from whatever i could decipher the syntax of the numeric data type is
> NUMERIC(precision,scale) and if we write NUMERIC (3,5) it would mean that we
> are trying to store a number which has 3 digits in total and 5 of them are
> to the right of the decimal point, which doesn't make sense !

It may sound weird but it does make sense. There are three digits in the
number and the rightmost of them is five positions to the right of the
decimal pointis. So you can store

0.00999
--12345

but not

0.01000
--12345

as that would need a fourth digit
and also not
0.000123
--123456
as not the rightmost digit is now six places right of the decimal
point.

Mathematically you store an integer with 3 digits and multiply it with
10^-5 to get the value.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: How to add function schema in search_path in option definitio

2023-07-09 Thread Peter J. Holzer

On 2023-07-08 19:00:02 +0200, Lorusso Domenico wrote:
> I've created a schema to handle some specific features.
> In the schema there are a couple of tables used by many functions (more than
> 20).
> In other words, I've created a schema as a package (as suggested in many
> points).

So this is one specific schema with a specific name and you don't need
this to work in any schema the function happens to be created in?

Then I think setting the search path as part of the function definition
(as already hinted at by Adrian) may be the easiest solution:

create function my_schema.foo (...)
returns ...
set search_path to my_schema, public
as $$ 
...
$$;

You could also do something like:

set search_path to my_schema, public;
create function foo (...)
returns ...
set search_path from current
as $$ 
...
$$;

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Question: Multiple pg clusters on one server can be reached with the standard port.

2023-06-20 Thread Peter J. Holzer

On 2023-06-20 10:10:47 -0500, Ron wrote:
> On 6/20/23 09:54, Peter J. Holzer wrote:
> > On 2023-06-19 16:09:34 -0500, Ron wrote:
> > > On 6/19/23 12:15, Peter J. Holzer wrote:
> > >  On 2023-06-19 07:49:49 -0500, Ron wrote:
> > >      On 6/19/23 05:33, Peter J. Holzer wrote:
> > >  So (again, as Francisco already wrote) the best way is 
> > > probably to write
> > >  a simple proxy which uses the database (not DNS) name for 
> > > routing. I
> > >  seem to remember that nginx has a plugin architecture for 
> > > protocols so
> > >  it might make sense to write that as an nginx plugin instead 
> > > of a
> > >  standalone server, but that's really a judgement call the 
> > > programmer has
> > >  to make. Another possibility would of course be to extend 
> > > pgbouncer to
> > >  do what the OP needs.
> > > 
> > >  How would this work with JDBC clients?
> > > 
> > >  Same as with any other client, I guess. Any reason why it should be
> > >  different?
> > > 
> > > 
> > > That goes to my ultimate point: why would this work, when the point of a
> > > database client is to connect to a database instance on a specific port 
> > > like
> > > 5432, not connect to a web server.
> > Consider this scenario:
> > 
> > You have several databases scattered across several hosts and ports:
> > 
> > db1  host1.example.com:5432
> > db2  host1.example.com:5433
> > db3  host2.example.com:5432
> > db4  host3.example.com:5432
> > 
> > Then you have your proxy/gateway/bouncer (whatever you want to call it)
> > listening on proxy.example.com:5432.
> 
> Proxies/gateways are great. My question is about why you mentioned nginx.

Somebody else mentioned nginx as a possible solution. I wrote that I
don't think that nginx can do that out of the box but it might be
possible to write a plugin/module. Personally. I wouldn't (learning how
to write nginx modules almost certainly takes longer than writing a
simple proxy from scratch), but if somebody is already familiar with
nginx modules and/or has other reasons to use nginx ...

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Question: Multiple pg clusters on one server can be reached with the standard port.

2023-06-20 Thread Peter J. Holzer

On 2023-06-19 16:09:34 -0500, Ron wrote:
> On 6/19/23 12:15, Peter J. Holzer wrote:
> On 2023-06-19 07:49:49 -0500, Ron wrote:
> On 6/19/23 05:33, Peter J. Holzer wrote:
> So (again, as Francisco already wrote) the best way is probably 
> to write
> a simple proxy which uses the database (not DNS) name for 
> routing. I
> seem to remember that nginx has a plugin architecture for 
> protocols so
> it might make sense to write that as an nginx plugin instead of a
> standalone server, but that's really a judgement call the 
> programmer has
> to make. Another possibility would of course be to extend 
> pgbouncer to
> do what the OP needs.
> 
> How would this work with JDBC clients?
> 
> Same as with any other client, I guess. Any reason why it should be
> different?
> 
> 
> That goes to my ultimate point: why would this work, when the point of a
> database client is to connect to a database instance on a specific port like
> 5432, not connect to a web server.

Consider this scenario:

You have several databases scattered across several hosts and ports:

db1  host1.example.com:5432
db2  host1.example.com:5433
db3  host2.example.com:5432
db4  host3.example.com:5432

Then you have your proxy/gateway/bouncer (whatever you want to call it)
listening on proxy.example.com:5432.

The clients all connect to proxy.example.com:5432.

The proxy does the TLS handshake (if necessary) and reads the first
packet. This contains the database name. The proxy then uses the
database name to look up where that database resides (e.g. for db3 it
gets host2.example.com:5432) opens a connection to that port (plus TLS
handshake, if necessary) and forwards the packet. After that it just has
to forward packets in both directions until the connection is closed.

The client never knows that the databases are actually on different
hosts and/or ports. As far as it is concerned, all the databases are on
proxy.example.com:5432.

There is one caveat: All the database names need to be unique.

Such a proxy should be straightforward to write. It only needs to
understand two requests of postgresql protocol (initiate TLS and
connect). It would be much simpler than e.g. pg_bouncer which has to
know about authentication, transactions, etc.[1]. Depending on the
expected number of parallel connections and throughput you might want to
consider what programming language and concurrency model
(multi-threaded, async, state machine, ...) to use.

hp

[1] But of course, if pgbouncer already does what you want, don't
reinvent the wheel.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Question: Multiple pg clusters on one server can be reached with the standard port.

2023-06-19 Thread Peter J. Holzer

On 2023-06-19 07:49:49 -0500, Ron wrote:
> On 6/19/23 05:33, Peter J. Holzer wrote:
> > As Francisco already pointed out, this can't work with nginx either. The
> > client resolves the alias and the TCP packets only contain the IP
> > address, not the alias which was used to get that address. So nginx
> > simply doesn't have that information and therefore can't act on it.
[...]
> > So (again, as Francisco already wrote) the best way is probably to write
> > a simple proxy which uses the database (not DNS) name for routing. I
> > seem to remember that nginx has a plugin architecture for protocols so
> > it might make sense to write that as an nginx plugin instead of a
> > standalone server, but that's really a judgement call the programmer has
> > to make. Another possibility would of course be to extend pgbouncer to
> > do what the OP needs.
> 
> How would this work with JDBC clients?

Same as with any other client, I guess. Any reason why it should be
different?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Question: Multiple pg clusters on one server can be reached with the standard port.

2023-06-19 Thread Peter J. Holzer

On 2023-06-19 06:06:33 +0200, Thomas Markus wrote:
> Am 16.06.23 um 13:53 schrieb Brainmue:
> > With firewall you mean an additional software, right?
> > Because with iptables or netfilter I can't forward TCP packets based on the 
> > DNS alias name. Or is
> > that possible?
> > 
> > I have the same problem with nginx. I just looked in the documentation 
> > again but I can't find a way
> > to distinguish which cluster to forward to based on the DNS alias.
> > Do you have an example for me?
> > 
> > We have already looked at pgbouncer and it works with that but 
> > unfortunately you have to do the
> > authentication in pgbouncer. Which we don't like so much.
> > 
> > Regards,
> > Michael
> 
> You cant setup firewall rules basedon dns names. firewall rules are based on
> ip adresses and dns resolution happens on rule creation.
> I dont have an example for nginx. As I remember nginx resolves dns names
> only for variables. So setup a variable with your hostname and use this
> variable in your server definition.

As Francisco already pointed out, this can't work with nginx either. The
client resolves the alias and the TCP packets only contain the IP
address, not the alias which was used to get that address. So nginx
simply doesn't have that information and therefore can't act on it.

For HTTP this works because the HTTP protocol contains a Host field
which the client fills with the name it used. But the Postgres protocol
has no such information (and in any case nginx probably doesn't
understand that protocol anyway).

So (again, as Francisco already wrote) the best way is probably to write
a simple proxy which uses the database (not DNS) name for routing. I
seem to remember that nginx has a plugin architecture for protocols so
it might make sense to write that as an nginx plugin instead of a
standalone server, but that's really a judgement call the programmer has
to make. Another possibility would of course be to extend pgbouncer to
do what the OP needs.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: DB migration : Sybase to Postgres

2023-05-26 Thread Peter J. Holzer

On 2023-05-25 08:10:42 -0500, Ron wrote:
> (You can create the FKs ahead of time, but use the NOT VALID clause; then,
> after data is loaded do ALTER TABLE ... VALIDATE CONSTRAINT.)

I don't think this will work:

| Normally, [ADD CONSTRAINT] will cause a scan of the table to verify
| that all existing rows in the table satisfy the new constraint. But if
| the NOT VALID option is used, this potentially-lengthy scan is
| skipped. The constraint will still be enforced against subsequent
| inserts or updates
(https://www.postgresql.org/docs/current/sql-altertable.html)

However, you can define a constraint as DEFERRABLE, and then defer
checking until commit:

hjp=> create table master(id int primary key, t text);
CREATE TABLE
hjp=> create table detail(id int primary key, master int references master 
DEFERRABLE INITIALLY IMMEDIATE, t text);
CREATE TABLE

hjp=> begin;
BEGIN
hjp=*> set constraints detail_master_fkey deferred;
SET CONSTRAINTS
hjp=*> insert into detail values(1, 1, '1/1');
INSERT 0 1
hjp=*> insert into detail values(2, 1, '1/2');
INSERT 0 1
hjp=*> insert into detail values(3, 2, '2/1');
INSERT 0 1
hjp=*> insert into detail values(4, 3, '3/1');
INSERT 0 1
hjp=*> insert into master values(1, '1');
INSERT 0 1
hjp=*> insert into master values(2, '2');
INSERT 0 1

-- We haven't inserted a master record with id 3 yet, so the commit will
-- fail:

hjp=*> commit;
ERROR:  insert or update on table "detail" violates foreign key constraint 
"detail_master_fkey"
DETAIL:  Key (master)=(3) is not present in table "master".

(You can also reenable the constraint explicitely before the end of a
transaction with SET CONSTRAINTS ... IMMEDIATE)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Having issue with SSL.

2023-05-26 Thread Peter J. Holzer

On 2023-05-24 21:02:04 +, Randy Needham wrote:
> The problem I am running into is my remote client can't connect via SSL to
> postgrsql.  I am able to from the server itself.  This is using pgAdmin 4 and
> making ssl mode as required.  Also ran psql.exe on the server to show that SSL
> was in fact working on the server. "SSL connection (protocol: TLSv1.3, cipher:
> TLS_AES_256_GCM_SHA384, bits: 256, compression: off)".  In the logs it is
> showing this when I try to connect via my remote client.
> 
> [2672] LOG:  could not accept SSL connection: An existing connection was
> forcibly closed by the remote host.
> 
> The error from pgAdmin 4 on the remote client is this.
> 
> connection failed: server closed the connection unexpectedly This probably
> means the server terminated abnormally before or while processing the 
> request. 
> SSL SYSCALL error: Connection reset by peer (0x2746/100054)

So both the server and the client claim that the connection was
terminated by the other side?

I suspect that the server and client cannot agree on a common cypher.
But if both are reasonably up to date that shouldn't happen (it can
happen if the SSL library on your server is much older than that on your
client or vice versa).

Can you use wireshark (or something similar) to record the session and
see where in the protocol they give up?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Maintaining accents with "COPY" ?

2023-05-25 Thread Peter J. Holzer

On 2023-05-25 07:14:40 +, Laura Smith wrote:
> I'm currently doing a CSV export using COPY:
> 
> COPY (select * from foo where bar='foo') TO '/tmp/bar.csv' DELIMITER ',' CSV 
> HEADER;
> 
> 
> This works great apart from accents are not preserved in the output,
> for example é gets converted to random characters, e.g. √© or similar.

How do you check the output?

If a single character is turned into 2 or 3 characters the issue is
usually that the program which produces the output (in the case of COPY
I think that would be the PostgreSQL server, not the client) produces
UTF-8, but the program consuming it expects an 8-bit character set
(typically windows-1252). See if oyu can tell that program that the file
is in UTF-8.

> How can I preserve accents ?

They probably already are preserved.

        hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: 15 pg_upgrade with -j

2023-05-23 Thread Peter J. Holzer

On 2023-05-23 13:17:24 -0500, Ron wrote:
> On 5/23/23 12:19, Peter J. Holzer wrote:
> > On 2023-05-22 21:10:48 -0500, Ron wrote:
> > > On 5/22/23 18:42, Tom Lane wrote:
> > > > It looks like the assumption was that issuing link()
  ^^
> > > > requests in parallel wouldn't help much but just swamp your disk
> > > > if they're all on the same filesystem.
> > > > Maybe that could use rethinking, not sure.
> > > It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is
> > > going to be on a different LUN from /var/lib/pgsql/9.6
> > You can't hardlink between different file systems.
> 
> We'd never hardlink.

But that was what Jeff and Tom were talking about. If you are changing
the subject you should at least make it explicit.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: 15 pg_upgrade with -j

2023-05-23 Thread Peter J. Holzer

On 2023-05-22 21:10:48 -0500, Ron wrote:
> On 5/22/23 18:42, Tom Lane wrote:
> > It looks like the assumption was that issuing link()
> > requests in parallel wouldn't help much but just swamp your disk
> > if they're all on the same filesystem.
> > Maybe that could use rethinking, not sure.
> 
> It does need rethinking in the era of VMs and SANs. /var/lib/pgsql/15 is
> going to be on a different LUN from /var/lib/pgsql/9.6

You can't hardlink between different file systems.

Even if you could assign single directories to specific LUNs (does any
file system allow this?) this would at best spread the updates across
two LUNs (the inodes would presumable stay on the source LUN and the
target directory would be on the target LUN).

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Modeling combinations (options and dependencies)

2023-05-19 Thread Peter J. Holzer

On 2023-05-18 19:21:23 +, eacil wrote:
> -
> DATA
> -
> 
> -You have objects that are linked to many tables describing their
> properties, such as an object__tag table.
> -These objects are sold in releases.
> -Thing is, objects can be made of objects. I call them superobjects
> when they assume that container role (as objects, they can be part of
> other superobjects). Multiple releases can be linked to one
> superobject. Yes, it's a parts scenario.
> -Objects can be shuffled in all sorts of unpredictable ways so it's
> useless to think that if one object is contained inside some
> superobject A, if that superobject A will be contained inside another
> superobject B, superobject B will inherit all objects from superobject
> A. In fact, my attempt at a solution will instantiate objects to link
> them to superobjects.
> -In a superobject, multiple objects can be grouped together into a set
> of options. One option can be made of multiple objects (majority of
> the time, just one). You can choose how many options minimum and
> maximum (aka a range) you can choose from a set (majority of the time,
> just one). It's possible to have a NONE choice aka you don't choose
> any option.
> -Any object can have 0 (common), 1 (common), or more (very rare)
> dependencies.
> 
> That's pretty much all.

Not sure if I followed that completely, but it certainly looks like a
quite complicated system of constraints.

> CREATE TABLE pairing (
> superobject_id integer NOT NULL,
> instance_a integer NOT NULL,
> instance_b integer NOT NULL,
> CONSTRAINT pairing__pk PRIMARY KEY (superobject_id,instance_a,instance_b)
> );

So if I understand this correctly, you are computing all the
combinations somewhere outside the database and then filling in the
pairings table with all valid pairs? Doesn't that have the same problem
of possibly generating an exponential number of combinations?

I'm also not sure if this is sufficient. If (A,B),(A,C),(A,D),(B,C),
(B,D),(C,D) are all valid combinations within superobject S, does that really
mean that (A,B,C,D) is a valid combination? I suspect you still have to
validate the results.

If you have to validate the results anyway, maybe you can radically
simplify the filter: Just add one row for each object which can possibly
appear in each super-object (or even: 1 row for each super-object with
an array of objects). Then find all super-objects which can contain all
the objects you are looking for and finally filter those in your
application. (Although I wonder how fast that validation is: That also
looks like it could potentially have exponential runtime)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

2023-05-19 Thread Peter J. Holzer

On 2023-05-19 16:55:00 +0200, Victor Nordam Suadicani wrote:
> Thanks for the perspective :)
> 
> > If you need SQL, you need to design for SQL for the get go. Not shoehorn 
> > your
> >  Rust data model into SQL.
> 
> Sometimes the data in the domain really does fit a sum type and then a sum 
> type
> is the right tool to use (whether you use Rust or Haskell or whatever
> language). Trying to shoehorn your domain data model into a data format that
> doesn't fit isn't the way to go either. I feel like it's a deficiency in SQL
> that there is no support for sum types. I would guess this is influenced by 
> the
> fact that SQL was developed in a time when there were no major programming
> languages with sum type support either.

That may have been a reason. It is worth noting, however that (at least
some versions of) entity-relationship diagrams do have graphical
symbols for "exactly/at most one of these attributes must be set". This
is very similar to the example you gave, where each of the three
variants had a name (and very different to e.g. TypeScript, where you
can just define a variable to be of type «string | number | Date»)

The way this is implemented is quite straightforward, as Dominique
wrote: You have several nullable columns and a constraint which ensures
that only one is not null.

For example

create table alternatetest (
id serial primary key,
name text,
i int, f float, t text, -- these three form a union/sum type
check (
(i is not null)::int + (f is not null)::int + (t is not null)::int 
= 1
)
);

The space overhead is very small (in PostgreSQL it's a single bit per
nullable column).

An ORM for Rust should be able to read the three columns and stuff them
into a single variable of sum type. I don't know Rust well enough if the
other direction is possible automatically (does the compiler and/or the
run-time system keep track which variant is currently valid?), but even
if it isn't, surely the application must have that information and be
able to pass it into the ORM.

If you aren't using an ORM but using SQL directly (as I prefer to do as
I find that ORMs are just another leaky abstraction layer which makes
stuff harder instead of easier) then you have to code that yourself, but
you can probably centralize that somewhere and the rest of your code
will be blissfully unaware.

(Of course you can stuff those values in a single column of JSONB type.
But I don't think this is better.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Death postgres

2023-05-12 Thread Peter J. Holzer

On 2023-05-12 17:41:37 +0200, Marc Millas wrote:
> On Fri, May 12, 2023 at 8:31 AM Peter J. Holzer  wrote:
> My guess is that the amount of parallelism is the problem.
> 
> work_mem is a per-node limit. Even a single process can use a multiple of
> work_mem if the query contains nested nodes (which almost every query
> does, but most nodes don't need much memory). With 5 parallel workers,
> the total consumption will be 5 times that. So to prevent the OOM
> condition you would need to reduce work_mem or max_parallel_workers (at
> least for this query).
> 
> 
> we have more than 100GB RAM and only 1 user, with one request running.
> work_mem is set to 10MB.

I don't remember mentioning you that before, so I had to guess.

>   for oom to kill due to work_mem it means that for one
> request with 2 left join, postgres needs more than 10.000 work_mem buffers.
> to me, it seems difficult to believe.

Yes. work_mem is unlikely to be the culprit here. It must be something
else which is eating the RAM. But I think it's hard to reproduce that
with the information you've given us this far.

> but that postgres may need that RAM space for hashing or whatever
> other similar purpose seems more probable. no ? 

That should adhere to the work_mem limit, too.

I'm not a core developer (actually not a Postgres developer at all, just
a developer of other stuff which happens to use Postgres), but I
remember that there was an issue some years ago where a hash table could
grow much larger than estimated by the planner. That issue was fixed,
but maybe you've run into a similar case?

> The description temp_file_limit says "...the maximum amount of disk
> space that *a process* can use...". So with 5 workers that's 210*5 =
> 1050 GB total. Again, you may want to reduce either temp_file_limit or
> max_parallel_workers.
> 
> Yes, but if so, we may have had a request canceled due to temp_file limit,
> which was not the case. 

Well, you've said that the disk space grew by "1.1 TB". I understand
that to mean "something between 1.05 TB and 1.15 TB", not exactly
11000 Bytes (or 1209462790554 Bytes or 1100213452800 Bytes,
depending on how you define "TB"). 1050 GB is within that range, so you
might just have missed that limit.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Death postgres

2023-05-12 Thread Peter J. Holzer

On 2023-05-11 21:27:57 +0200, Marc Millas wrote:
> the 75 lines in each tables are not NULLs but '' empty varchar, which,
> obviously is not the same thing.
> and which perfectly generates 500 billions lines for the left join.
> So, no planner or statistics pbs. apologies for the time wasted.

No problem. Glad to have solved that puzzle.

> Back to the initial pb: if, with temp_file_limit positioned to 210 GB,
> I try to run the select * from table_a left join table_b on the col_a
> (which contains the 75 '' on both tables)
> then postgres do crash, killed by oom, after having taken 1.1 TB of additional
> disk space. 

My guess is that the amount of parallelism is the problem.

work_mem is a per-node limit. Even a single process can use a multiple of
work_mem if the query contains nested nodes (which almost every query
does, but most nodes don't need much memory). With 5 parallel workers,
the total consumption will be 5 times that. So to prevent the OOM
condition you would need to reduce work_mem or max_parallel_workers (at
least for this query).

The description temp_file_limit says "...the maximum amount of disk
space that *a process* can use...". So with 5 workers that's 210*5 =
1050 GB total. Again, you may want to reduce either temp_file_limit or
max_parallel_workers.

> to my understanding, before postgres 13, hash aggregate did eat RAM limitless
> in such circumstances.
> but in 14.2 ??
> (I know, 14.8 is up...)

Maybe the older version of postgres didn't use as many workers for that
query (or maybe not parallelize it at all)?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Death postgres

2023-05-10 Thread Peter J. Holzer

On 2023-05-10 22:52:47 +0200, Marc Millas wrote:
> On Wed, May 10, 2023 at 7:24 PM Peter J. Holzer  wrote:
> 
> On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
> >  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=
> 97)
> >    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=
> 1021522829864 width=97)
> ...
> >                ->  Parallel Hash Left Join  (cost=
> 604502.76..1276224253.51 rows=204304565973 width=97)
> >                      Hash Cond: ((t1.col_ano)::text = 
> (t2.col_ano)::text)
> ...
> >
> > //so.. the planner guess that those 2 join will generate 1000 billions
> rows...
> 
> Are some of the col_ano values very frequent? If say the value 42 occurs
> 1 million times in both table_a and table_b, the join will create 1
> trillion rows for that value alone. That doesn't explain the crash or the
> disk usage, but it would explain the crazy cost (and would probably be a
> hint that this query is unlikely to finish in any reasonable time).
> 
> 
> good guess, even if a bit surprising: there is one (and only one) "value" 
> which
> fit your supposition: NULL

But NULL doesn't equal NULL, so that would result in only one row in the
left join. So that's not it.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Death postgres

2023-05-10 Thread Peter J. Holzer

On 2023-05-10 16:35:04 +0200, Marc Millas wrote:
>  Unique  (cost=72377463163.02..201012533981.80 rows=1021522829864 width=97)
>    ->  Gather Merge  (cost=72377463163.02..195904919832.48 rows=1021522829864 
> width=97)
...
>                ->  Parallel Hash Left Join  (cost=604502.76..1276224253.51 
> rows=204304565973 width=97)
>                      Hash Cond: ((t1.col_ano)::text = (t2.col_ano)::text)
...
> 
> //so.. the planner guess that those 2 join will generate 1000 billions rows...

Are some of the col_ano values very frequent? If say the value 42 occurs
1 million times in both table_a and table_b, the join will create 1
trillion rows for that value alone. That doesn't explain the crash or the
disk usage, but it would explain the crazy cost (and would probably be a
hint that this query is unlikely to finish in any reasonable time).

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Death postgres

2023-05-06 Thread Peter J. Holzer

On 2023-05-06 03:14:20 +0200, Marc Millas wrote:
> postgres 14.2 on Linux redhat
> 
> temp_file_limit set around 210 GB.
> 
> a select request with 2 left join have crashed the server (oom killer) after
> the postgres disk occupation did grow from 15TB to 16 TB.

temp_file_limit limits the space a process may use on disk while the OOM
killer gets activated when the system runs out of RAM. So these seem to
be unrelated.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: libpq and multi-threading

2023-05-03 Thread Peter J. Holzer

On 2023-05-03 06:35:26 -0600, Michael Loftis wrote:
> That is not a thread. Linux man clone right at the start …
> 
> “clone, __clone2, clone3 - create a child process”
> 
> What you want is pthread_create (or similar)

clone is the system call which is used to create both processes and
threads (in the early days of Linux that generalization was thought to
be beneficial, but POSIX has all kinds of special rules for processes
and threads so it may actually have made stuff more complicated.)

I do agree that pthread_create (or the C11 thrd_create) is the way to
go. It will just call clone behind the scenes, but it will do so with
the right flags and possibly set up some other stuff expected by the
rest of the C library, too.

There may be good reasons to use the low level function in some cases.
But I'd say that in that case you should better know what that means
exactly.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: libpq and multi-threading

2023-05-02 Thread Peter J. Holzer

On 2023-05-02 17:43:06 +0200, Michael J. Baars wrote:
> I don't think it is, but let me shed some more light on it.

One possibly quite important information you haven't told us yet is
which OS you use.

Or how you create the threads, how you pass the results around, what
else you are possibly doing between getting the result and trying to use
it ...

A short self-contained test case might shed some light on this.

> After playing around a little with threads and memory, I now know that the
> PGresult is not read-only, it is read-once. The child can only read that
> portion of parent memory, that was written before the thread started. 
> Read-only
> is not strong enough.
> 
> Let me correct my first mail. Making libpq use mmap is not good enough either.
> Shared memory allocated by the child can not be accessed by the parent.

Are you sure you are talking about threads and not processes? In the OSs
I am familiar with, threads (of the same process) share a common address
space. You don't need explicit shared memory and there is no such thing
as "parent memory" (there is thread-local storage, but that's more a
compiler/library construct). 

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: DBD::Pg (version 3.16.3) returns EMPTY char columns as 'undef'

2023-04-25 Thread Peter J. Holzer

On 2023-04-25 14:41:45 +0200, Matthias Apitz wrote:
> We're using the above DBD::Pg version on Linux together with PostgreSQL 15.1
> On fetch empty char columns are returned as (Perl) 'undef'
> 
>   while ( my @row_ary = $dba->FetchArray()) {

What is FetchArray? Neither perldoc DBI nor perldoc DBD::Pg mentions
this method. Did you use a wrapper around DBI? (I would have expected
fetchrow_array here)

>  foreach my $i (0..$#row_ary) {
> if ($row_ary[$i] eq undef)  {

> print $row_ary[1] . "\n";
> next;

So when any column is null you want to print the first one and skip to
the next one?

> }
> ...
> which later leads in our code to NULL values '\N' in the writing of a 
> CSV-like export
> files. Ofc NULL values in the database are something else as '' char
> strings. 

Works for me (PostgreSQL 14, Perl 5.34, DBI 1.643, DBD::Pg 3.15):

% cat empty_char
#!/usr/bin/perl

use v5.34;
use warnings;
use Data::Dumper;

use DBIx::SimpleConnect;

my $dbh = DBIx::SimpleConnect->connect("default");

$dbh->do("drop table if exists empty_char");
$dbh->do("create table empty_char (id serial primary key, t char(5))");
$dbh->do("insert into empty_char(t) values(null)");
$dbh->do("insert into empty_char(t) values('')");
$dbh->do("insert into empty_char(t) values(' ')");
$dbh->do("insert into empty_char(t) values('a')");
$dbh->do("insert into empty_char(t) values('a')");

my $data = $dbh->selectall_arrayref(
"select * from empty_char",
{Slice => {}}
 );

print Dumper($data);

(DBIx::SimpleConnect is just a simple wrapper which looks up connection
strings. It returns a normal DBI database handle object)

% ./empty_char
$VAR1 = [
  {
't' => undef,
'id' => 1
  },
  {
'id' => 2,
't' => ' '
  },
  {
't' => ' ',
'id' => 3
  },
  {
't' => 'a',
'id' => 4
  },
  {
'id' => 5,
't' => 'a'
  }
];

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: JSON / ASP.NET AJAX Dates support in PostgreSQL

2023-04-15 Thread Peter J. Holzer

On 2023-04-15 09:12:41 -0700, Adrian Klaver wrote:
> On 4/15/23 03:46, Peter J. Holzer wrote:
> > On 2023-04-14 10:44:08 -0700, Adrian Klaver wrote:
> > > On 4/14/23 9:31 AM, Peter J. Holzer wrote:
> > > > On 2023-04-13 10:07:09 -0500, Ron wrote:
> > > > > On 4/13/23 09:44, Sebastien Flaesch wrote:
> > > > >       "LastChangeDateTime" : "\/Date(1672692813062+0100)\/"
> > > > > 
> > > > >   Warning: Note the backslash before the slashes!
> > > > 
> > > > That's a Noop. According to RFC 8259, "\/" is the same as "/" (no idea
> > > > why they even specified that - it seems quite pointless).
> > > 
> > > It is a cheat explained here:
> > > 
> > > https://weblogs.asp.net/bleroy/dates-and-json
> > 
> > Yes, but it needs a specialized JSON parser to note that. As they write:
> > 
> > > Of course, a parser that doesn't know about this convention will just
> > > see a string,
> > 
> > And not only will it just see a string, it will output a string that's
> > indistinguishable from a string with the input
> > "/Date(1672692813062+0100)/". So any code after the parser can't detect
> > those extra backslashes. (This would include for example the object_hook
> > in the Python json Decoder which gets the decoded strings, not the raw
> > strings).
> 
> I would encourage you to read the whole post, it is short. Bottom line, this
> is a cheat MS created for the Microsoft Ajax Library. Their hope was(from
> the post):

Yes, I got that.

[...]
> I'm going to go out on a limb and say whatever JSON parsing Postgres is
> doing does not recognize this format.

Exactly. This was my point. Nor is any other JSON parser (except the one
which invented that cheat of course) likely to recognize it. And it's
sort of hard to add to existing parsers without breaking app
compatibility. In short: I don't think adding that to Postgres is a good
idea.

hp

PS: I do appreciate it as a hack, though.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: JSON / ASP.NET AJAX Dates support in PostgreSQL

2023-04-15 Thread Peter J. Holzer

On 2023-04-14 10:44:08 -0700, Adrian Klaver wrote:
> On 4/14/23 9:31 AM, Peter J. Holzer wrote:
> > On 2023-04-13 10:07:09 -0500, Ron wrote:
> > > On 4/13/23 09:44, Sebastien Flaesch wrote:
> > >  Is there an easy way to convert JSON data containing ASP.NET AJAX 
> > > Dates
> > >  into PostgreSQL timestamp?
> > > 
> > >  I have this kind of JSON data:
> > > 
> > >  {
> > >      "PurchaseOrder" : "45",
> > >      "CreationDate" : "\/Date(167235840)\/",
> > >      "LastChangeDateTime" : "\/Date(1672692813062+0100)\/"
> > >  }
> > > 
> > >  Warning: Note the backslash before the slashes!
> > 
> > That's a Noop. According to RFC 8259, "\/" is the same as "/" (no idea
> > why they even specified that - it seems quite pointless).
> 
> It is a cheat explained here:
> 
> https://weblogs.asp.net/bleroy/dates-and-json

Yes, but it needs a specialized JSON parser to note that. As they write:

> Of course, a parser that doesn't know about this convention will just
> see a string,

And not only will it just see a string, it will output a string that's
indistinguishable from a string with the input
"/Date(1672692813062+0100)/". So any code after the parser can't detect
those extra backslashes. (This would include for example the object_hook
in the Python json Decoder which gets the decoded strings, not the raw
strings).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Guidance on INSERT RETURNING order

2023-04-14 Thread Peter J. Holzer

On 2023-04-11 15:45:59 -0600, Rob Sargent wrote:
> > > Can your client retain a hashmap of md5,data pairings, allowing the
> > > lookup on the way back using the returned data and supplied id?
> > > 
> > When using unique columns or similar, that's something that is done,
> > but if there are no unique columns in the value no match can be done
> > reliably with the source data, since sqlalchemy is a library that
> > allows arbitrary schemas to be generated.

> So you're returned data is not what was sent to the server? Otherwise it
> should generate the same md5, as I understand it.

That's not necessarily the case. There are quite a few data types where
the input value is truncated, rounded or otherwise normalized. So I
don't think you can generally expect to read back exactly the same value
you inserted.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: JSON / ASP.NET AJAX Dates support in PostgreSQL

2023-04-14 Thread Peter J. Holzer

On 2023-04-13 10:07:09 -0500, Ron wrote:
> On 4/13/23 09:44, Sebastien Flaesch wrote:
> Is there an easy way to convert JSON data containing ASP.NET AJAX Dates
> into PostgreSQL timestamp?
> 
> I have this kind of JSON data:
> 
> {
>     "PurchaseOrder" : "45",
>     "CreationDate" : "\/Date(167235840)\/",
>     "LastChangeDateTime" : "\/Date(1672692813062+0100)\/"
> }
> 
> Warning: Note the backslash before the slashes!

That's a Noop. According to RFC 8259, "\/" is the same as "/" (no idea
why they even specified that - it seems quite pointless).

> According to JSON spec this is valid JSON and used by AJAX Date format.

It's valid JSON, but for JSON it's just a string, not a date.

Any interpretation is strictly by convention between the sender and the
receiver.


> This looks like "milliseconds since the Unix epoch:
> 
> $ date -d @1672692813.062
> Mon 02 Jan 2023 02:53:33 PM CST
> 
> Thus:
> select to_timestamp(cast(1672692813062 as bigint))::timestamp;

ITYM:

select to_timestamp(1672692813062/1000.0);

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

2023-04-10 Thread Peter J. Holzer

On 2023-04-10 14:35:38 +0200, Karsten Hilbert wrote:
> All I really wanted to hint at is that "incoming call
> timestamp" may work pretty well in given settings but does
> not _always_ make for a "unique enough" key.

This is true for all natural primary keys: Any attribute of an entity
which is unique for a given application may not be unique for other
applications.

    hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: "PANIC: could not open critical system index 2662" - twice

2023-04-07 Thread Peter J. Holzer

On 2023-04-07 13:04:34 +0200, Laurenz Albe wrote:
> On Thu, 2023-04-06 at 16:41 +, Evgeny Morozov wrote:
> > What can I do to figure out why this is happening and prevent it from 
> > happening again?
> 
> No idea about the former, but bad hardware is a good enough explanation.
> 
> As to keeping it from happening: use good hardware.

Also: Use checksums. PostgreSQL offers data checksums[1]. Some filesystems
also offer checksums.

This doesn't prevent corruption but at least it will be detected early
and can't spread.

hp

[1] For some reason I thought the Debian/Ubuntu packages enabled this by
default. But that doesn't seem to be the case.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: Using CTID system column as a "temporary" primary key

2023-03-30 Thread Peter J. Holzer

On 2023-03-29 12:15:09 -0700, Adrian Klaver wrote:
> On 3/29/23 09:43, Peter J. Holzer wrote:
> > On 2023-03-29 07:59:54 -0700, Adrian Klaver wrote:
> > > On 3/29/23 07:19, Sebastien Flaesch wrote:
> > > > INSERT statements must not use the serial column, so you have to
> > > > list all columns of the table and provide only the values of the
> > > > non-serial columns. With Informix you could just specific a zero
> > > > to get a new generated serial, but seems this has never been
> > > > considered with PostgreSQL.
> > > 
> > > Yes it has:
> > [...]
> > > insert into seq_test values(default, 'test');
> > 
> > Default is not the same as zero.
> 
> It accomplishes the same thing,

No. As has been pointed out by others, default is keyword. Unlike 0 (or
NULL) you can't bind it, You can argue that that would be a bad idea
anyway (and in most - maybe all - cases I'd agree with you), but I
consider that a pretty fundamental difference.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

Re: Using CTID system column as a "temporary" primary key

2023-03-29 Thread Peter J. Holzer

On 2023-03-29 07:59:54 -0700, Adrian Klaver wrote:
> On 3/29/23 07:19, Sebastien Flaesch wrote:
> > INSERT statements must not use the serial column, so you have to list
> > all columns of the table and provide only the values of the non-serial
> > columns. With Informix you could just specific a zero to get a new
> > generated serial, but seems this has never been considered with
> > PostgreSQL.
> 
> Yes it has:
[...]
> insert into seq_test values(default, 'test');

Default is not the same as zero.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature

1 2 3 4 5 6 >

1 - 100 of 548 matches

Mail list logo