subject:"Re\:"

veem v  writes:
> Able to reproduce this deadlock graph as below.  Now my question is , this
> is a legitimate scenario in which the same ID can get inserted from
> multiple sessions and in such cases it's expected to skip that (thus "On
> conflict Do nothing" is used) row. But as we see it's breaking the code
> with deadlock error during race conditions where a lot of parallel threads
> are operating. So how should we handle this scenario?

Do you have to batch multiple insertions into a single transaction?
If so, can you arrange to order them consistently across transactions
(eg, sort by primary key before inserting)?

regards, tom lane

Re: IO related waits

2024-09-20 Thread veem v

On Thu, 19 Sept, 2024, 8:40 pm Adrian Klaver, 
wrote:

> On 9/19/24 05:24, Greg Sabino Mullane wrote:
> > On Thu, Sep 19, 2024 at 5:17 AM veem v 
> > This is really difficult to diagnose from afar with only snippets of
> > logs and half-complete descriptions of your business logic. Pull
> > everyone involved into a room with a whiteboard, and produce a document
> > describing exactly what your application does, and how it is doing it.
> > Switch from reactive to proactive.
>
>
Able to reproduce this deadlock graph as below.  Now my question is , this
is a legitimate scenario in which the same ID can get inserted from
multiple sessions and in such cases it's expected to skip that (thus "On
conflict Do nothing" is used) row. But as we see it's breaking the code
with deadlock error during race conditions where a lot of parallel threads
are operating. So how should we handle this scenario? Will setting the
"lock_timeout" parameter at session level will help us anyway here?

Create table t1(id numeric primary key);

Session 1:-
Begin
Insert into table1 values(1) on conflict(id) do nothing;

Session 2:
Begin
Insert into table1 values(2) on conflict(id) do nothing;

session 1:-
Insert into table1 values (2) on conflict(id) do nothing;

It got hung as it waits for the session-2 to commit/rollback the transaction

Session-2:-
Insert into table1 values(1) on conflict(id) do nothing;

deadlock detected... and this session terminated.

Regards
Veem

Re: Why no pg_has_role(..., 'ADMIN')?

Robert Haas  writes:
> On Fri, Sep 20, 2024 at 2:34 PM Tom Lane  wrote:
>> I'm now inclined to add wording within the pg_has_role entry, along
>> the lines of
>> 
>> WITH ADMIN OPTION or WITH GRANT OPTION can be added to any of
>> these privilege types to test whether ADMIN privilege is held
>> (all six spellings test the same thing).

> I don't have an opinion about the details, but +1 for documenting it
> somehow. I also think it's weird that we have six spellings that test
> the same thing, none of which are $SUBJECT. pg_has_role seems a little
> half-baked to me...

Yeah.  I think the original idea was to make it as parallel to
has_table_privilege and friends as we could (but why did we then
stick a pg_ prefix on it?).  So that led to MEMBER WITH GRANT OPTION,
and then the other spellings seem to have come along later.

regards, tom lane

Re: Why no pg_has_role(..., 'ADMIN')?

2024-09-20 Thread Robert Haas

On Fri, Sep 20, 2024 at 2:34 PM Tom Lane  wrote:
> I'm now inclined to add wording within the pg_has_role entry, along
> the lines of
>
> WITH ADMIN OPTION or WITH GRANT OPTION can be added to any of
> these privilege types to test whether ADMIN privilege is held
> (all six spellings test the same thing).

I don't have an opinion about the details, but +1 for documenting it
somehow. I also think it's weird that we have six spellings that test
the same thing, none of which are $SUBJECT. pg_has_role seems a little
half-baked to me...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why no pg_has_role(..., 'ADMIN')?

I wrote:
> Robert Haas  writes:
>> I think this already exists. The full list of modes supported by
>> pg_has_role() is listed in convert_role_priv_string(). You can do
>> something like pg_has_role('alice', 'USAGE WITH ADMIN OPTION'). This
>> is not new: it worked in older releases too, but AFAIK it's never been
>> mentioned in the documentation.

> Surely that's a bad documentation omission.

Actually, it's not true that it's entirely undocumented, because the
text above the table that describes pg_has_role mentions

Optionally, WITH GRANT OPTION can be added to a privilege type to
test whether the privilege is held with grant option.

But I concur that it's not immediately obvious that that applies
to role membership, since we don't use the "grant option" terminology
for roles.

I'm now inclined to add wording within the pg_has_role entry, along
the lines of

WITH ADMIN OPTION or WITH GRANT OPTION can be added to any of
these privilege types to test whether ADMIN privilege is held
(all six spellings test the same thing).

regards, tom lane

Re: Why no pg_has_role(..., 'ADMIN')?

Robert Haas  writes:
> I think this already exists. The full list of modes supported by
> pg_has_role() is listed in convert_role_priv_string(). You can do
> something like pg_has_role('alice', 'USAGE WITH ADMIN OPTION'). This
> is not new: it worked in older releases too, but AFAIK it's never been
> mentioned in the documentation.

Surely that's a bad documentation omission.  Do we want to document
all the variants convert_role_priv_string allows?  They appear
functionally equivalent, so I'd be inclined to document just one.
'USAGE WITH ADMIN OPTION' seems a reasonable choice.

regards, tom lane

Re: Why no pg_has_role(..., 'ADMIN')?

2024-09-20 Thread Robert Haas

On Fri, Sep 20, 2024 at 12:37 PM Laurenz Albe  wrote:
> > But knowing whether DROP ROLE will work,
> > w/o invalidating the current transaction,
> > seems like something quite useful to know now, no?
> >
> > I can query pg_auth_members for admin_option,
> > but only easily for direct membership. Taking into
> > account indirect membership, which I assume applies,
> > is exactly why pg_has_role() exists, no?
>
> That would be a useful addition, yes.

I think this already exists. The full list of modes supported by
pg_has_role() is listed in convert_role_priv_string(). You can do
something like pg_has_role('alice', 'USAGE WITH ADMIN OPTION'). This
is not new: it worked in older releases too, but AFAIK it's never been
mentioned in the documentation.

However, the precise rule for DROP ROLE in v16+ is not just that you
need to have ADMIN OPTION on the role. The rule is:

1. You must have ADMIN OPTION on the target role.
2. You must also have CREATEROLE.
3. If the target role is SUPERUSER, you must be SUPERUSER.

If I'm not wrong, pg_has_role(..., 'USAGE WITH ADMIN OPTION') will
test #1 for you, but not #2 or #3.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Why no pg_has_role(..., 'ADMIN')?

2024-09-20 Thread Laurenz Albe

On Fri, 2024-09-20 at 17:26 +0200, Dominique Devienne wrote:
> To find out whether a ROLE can DROP another in v16+.
> Prior to v16, just having CREATEROLE was enough,
> so it didn't really seem necessary.
> 
> But knowing whether DROP ROLE will work,
> w/o invalidating the current transaction,
> seems like something quite useful to know now, no?
> 
> I can query pg_auth_members for admin_option,
> but only easily for direct membership. Taking into
> account indirect membership, which I assume applies,
> is exactly why pg_has_role() exists, no?

That would be a useful addition, yes.

Yours,
Laurenz Albe

Re: Dependencies on the system view

2024-09-20 Thread Ron Johnson

On Fri, Sep 20, 2024 at 7:32 AM Olleg  wrote:

> Hi all.
>
> One of our programmer created a view based on the system view. I tried
> to explain him, that he created a dependency from the ordinary database
> to the system object and this is a bad idea. But he is not smart enough.
> So I need a guru opinion. Is this permissible or will here be a trouble
> with, for instance, pg_upgrade?
>
> CREATE OR REPLACE VIEW public.all_tables
> AS SELECT n.nspname AS schemaname,
>  c.relname AS tablename,
>  pg_get_userbyid(c.relowner) AS tableowner,
>  c.reltuples AS num_rows,
>  c.relkind,
>  CASE c.relkind
>  WHEN 'f'::"char" THEN 'Foreign table'::text
>  WHEN 'r'::"char" THEN 'Relation'::text
>  WHEN 'i'::"char" THEN 'Index'::text
>  WHEN 'S'::"char" THEN 'Sequence'::text
>  WHEN 't'::"char" THEN 'TOAST'::text
>  WHEN 'v'::"char" THEN 'View'::text
>  WHEN 'm'::"char" THEN 'Materialized view'::text
>  WHEN 'c'::"char" THEN 'Composite type'::text
>  WHEN 'p'::"char" THEN 'Partitioned table'::text
>  WHEN 'I'::"char" THEN 'partitioned Index'::text
>  ELSE NULL::text
>  END AS rel_type,
>  c.relpersistence,
>  CASE c.relpersistence
>  WHEN 'p'::"char" THEN 'permanent'::text
>  WHEN 'u'::"char" THEN 'unlogged'::text
>  WHEN 't'::"char" THEN 'temporary'::text
>  WHEN 'c'::"char" THEN 'constant'::text
>  ELSE NULL::text
>  END AS persistence,
>  t.spcname AS tablespace,
>  c.relhasindex AS hasindexes,
>  c.relhasrules AS hasrules,
>  c.relhastriggers AS hastriggers
> FROM pg_class c
>   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
>   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace;
>

1. pg_upgrade will throw an error if a column in one of those catalog
tables is *removed*.  Version release notes *should* mention whether
columns are dropped; you'll just have to drop that view beforehand, and
create a new version afterwards, possibly modifying any pg/plsql.No big
deal, honestly; just something to keep in mind.

2. The query will become incomplete/wrong when new relkind and
relpersistence values are added, necessitating an updated version.  Again,
not a big deal, and just something to keep in mind.

Thus, I see no problem with that sort of view.  Heck, I've made similar
views; they're all over my cron jobs.  (I think it should not be in public
-- my views all go in the "dba" schema -- but that's a different issue.)

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: pg_locks in production

2024-09-20 Thread Laurenz Albe

On Thu, 2024-09-19 at 15:08 -0400, Wizard Brony wrote:
> If performance is not an issue, is it valid to use the pg_locks
> system view in production code? Or are there other concerns
> besides performance that would make its use inappropriate for production?

There are no concerns if you use pg_locks on a production system.

Yours,
Laurenz Albe

Re: PgBackRest and WAL archive expiry

2024-09-19 Thread Christophe Pettus

> On Sep 19, 2024, at 22:46, KK CHN  wrote:
> 
> 1. In the  EPAS serverI have   postgres.conf with 
> archive_command = 'pgbackrest --stanza=EMI_Repo archive-push %p && cp %p  
> /data/archive/%f'
> 
> The problem is that the   /data/archive  folder is growing  within a few days 
> to 850GB  of 2 TB  partition.

The /data/archive directory is entirely under your control.  pgbackrest and 
PostgreSQL don't manage them in any way.  It will just keep growing 
indefinitely unless you take action to delete the WAL segments out of it.

There's no real benefit in maintaining that separate /data/archive directory; 
pgbackrest archives and manages the lifecycle of the WAL segments in its 
repository.  I wouldn't bother with that separate archive, and just use 
pgbackrest.

Re: How batch processing works

2024-09-19 Thread Lok P

Below are the results for the posted methods. Tested it on local and it
gave no difference in timing between the method-2 andmethod-3. Failed to
run in dbfiddle somehow.

Also I was initially worried if adding the trigger to the our target table,
will worsen the performance as because , it will make all the execution to
"row by row" rather a true batch insert(method-3 as posted) as there will
be more number of context switches , but it seems it will still be doing
the batch commits(like the way its in method-2). So as per that , we won't
lose any performance as such. Is this understanding correct?


*Method-1- 00:01:44.48*

*Method-2- 00:00:02.67*

*Method-3- 00:00:02.39*

https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6

On Thu, Sep 19, 2024 at 6:42 PM Lok P  wrote:

>
>
> On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson 
> wrote:
>
>> On Thu, Sep 19, 2024 at 5:24 AM Lok P  wrote:
>>
>>>
>>>
>>> [snip]
>>
>>> DO $$
>>> DECLARE
>>> num_inserts INTEGER := 10;
>>> batch_size INTEGER := 50;
>>> start_time TIMESTAMP;
>>> end_time TIMESTAMP;
>>> elapsed_time INTERVAL;
>>> i INTEGER;
>>> BEGIN
>>> -- Method 1: Individual Inserts with Commit after every Row
>>> start_time := clock_timestamp();
>>>
>>> FOR i IN 1..num_inserts LOOP
>>> INSERT INTO parent_table VALUES (i, 'a');
>>> COMMIT;
>>> END LOOP;
>>>
>>> end_time := clock_timestamp();
>>> elapsed_time := end_time - start_time;
>>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>> VALUES ('Method 1: Individual Inserts with Commit after every Row',
>>> start_time, end_time, elapsed_time);
>>>
>>> -- Method 2: Individual Inserts with Commit after 100 Rows
>>> start_time := clock_timestamp();
>>>
>>> FOR i IN 1..num_inserts LOOP
>>> INSERT INTO parent_table2 VALUES (i, 'a');
>>> -- Commit after every 100 rows
>>> IF i % batch_size = 0 THEN
>>> COMMIT;
>>> END IF;
>>> END LOOP;
>>>
>>> -- Final commit if not already committed
>>>commit;
>>>
>>> end_time := clock_timestamp();
>>> elapsed_time := end_time - start_time;
>>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
>>> start_time, end_time, elapsed_time);
>>>
>>> -- Method 3: Batch Inserts with Commit after all
>>> start_time := clock_timestamp();
>>>
>>> FOR i IN 1..(num_inserts / batch_size) LOOP
>>> INSERT INTO parent_table3 VALUES
>>> (1 + (i - 1) * batch_size, 'a'),
>>>
>> [snip]
>>
>>> (49 + (i - 1) * batch_size, 'a'),
>>> (50 + (i - 1) * batch_size, 'a'));
>>> COMMIT;
>>> END LOOP;
>>>
>>> COMMIT;  -- Final commit for all
>>> end_time := clock_timestamp();
>>> elapsed_time := end_time - start_time;
>>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
>>> end_time, elapsed_time);
>>>
>>> END $$;
>>>
>>
>> Reproduce what behavior?
>>
>> Anyway, plpgsql functions (including anonymous DO statements) are -- to
>> Postgresql -- single statements.  Thus, they'll be faster than
>> individual calls..
>>
>> An untrusted language like plpython3u might speed things up even more, if
>> you have to read a heterogeneous external file and insert all the records
>> into the db.
>>
>
> Here if you see my script , the method-1 is doing commit after each row
> insert. And method-2 is doing a batch commit i.e. commit after every "50"
> row. And method-3 is doing a true batch insert i.e. combining all the 50
> values in one insert statement and submitting to the database in oneshot
> and then COMMIT it, so the context switching will be a lot less. So I was
> expecting Method-3 to be the fastest way to insert the rows here, but the
> response time shows the same response time for Method-2 and method-3.
> Method-1 is the slowest through.
>

Re: Need assistance in converting subqueries to joins

2024-09-19 Thread Tom Lane

Siraj G  writes:
> Please find below the query in the format and its execution plan:

[ blink... ]  I'm not sure what you are using there, but it is
*not* Postgres.  There are assorted entries in the execution
plan that community Postgres has never heard of, such as

> -> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table
> (weedout)  (cost=2085.53 rows=1988) (actual time=0.321..22600.652
> rows=10298 loops=1)

> -> Single-row index lookup on P0 using IS_PROJ_PK
> (IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID)  (cost=0.63
> rows=1) (actual time=0.000..0.000 rows=1 loops=50)

Maybe this is RDS, or Aurora, or Greenplum, or one of many other
commercial forks of Postgres?  In any case you'd get more on-point
advice from their support forums than from the PG community.
It looks like this is a fork that has installed its own underlying
table engine, meaning that what we know about performance may not
be terribly relevant.

regards, tom lane

Re: Need assistance in converting subqueries to joins

2024-09-19 Thread Siraj G

Hello Adrian!

Please find below the query in the format and its execution plan:

SELECT
em_exists_idFROM
IS_SEC_FILTWHERE (IS_SEC_FILT_GUID)NOT IN (
SELECT
IS_OBJ_GUID
FROM
TMP_IS_SEC_FILT T0,
IS_PROJ P0
WHERE
T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)AND (IS_PROJ_ID) IN (
SELECT
IS_PROJ_ID
FROM
IS_PROJ P0,
TMP_IS_SEC_FILT T0,
EM_MD R0
WHERE
T0.IS_REPOSITORY_GUID = R0.REP_GUID
AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_REPOSITORY_ID = R0.REP_ID);


Query plan:

'-> Aggregate: count(0)  (cost=2284.32 rows=1988) (actual
time=22602.583..22602.584 rows=1 loops=1)\n
-> Remove duplicate (P0, IS_SEC_FILT) rows using temporary table
(weedout)  (cost=2085.53 rows=1988) (actual time=0.321..22600.652
rows=10298 loops=1)\n-> Filter:
(IS_SEC_FILT.IS_SEC_FILT_GUID,(select #2) is
false)  (cost=2085.53 rows=1988) (actual time=0.315..22433.412
rows=514900 loops=1)\n
-> Inner hash join (IS_SEC_FILT.IS_PROJ_ID = P0.IS_PROJ_ID)
(cost=2085.53 rows=1988) (actual time=0.188..96.362 rows=517350
loops=1)\n
-> Index scan on IS_SEC_FILT using IS_SEC_FILT_PK  (cost=28.84
rows=19879) (actual time=0.019..7.386 rows=20086 loops=1)\n
-> Hash\n
-> Nested loop inner join  (cost=8.05 rows=1) (actual
time=0.064..0.132 rows=50 loops=1)\n
-> Inner hash join (T0.IS_REPOSITORY_GUID = R0.REP_GUID)  (cost=1.70
rows=1) (actual time=0.047..0.094 rows=50 loops=1)\n
-> Filter: (T0.IS_PROJ_GUID is not null)  (cost=0.38 rows=5) (actual
time=0.010..0.041 rows=50 loops=1)\n
-> Table scan on T0  (cost=0.38 rows=50) (actual time=0.010..0.037
rows=50 loops=1)\n
-> Hash\n
-> Filter: (R0.REP_ID is not null)  (cost=0.45 rows=2) (actual
time=0.022..0.025 rows=2 loops=1)\n
-> Table scan on R0  (cost=0.45 rows=2) (actual time=0.021..0.023
rows=2 loops=1)\n
-> Filter: (P0.IS_REPOSITORY_ID = R0.REP_ID)  (cost=0.63 rows=1)
(actual time=0.001..0.001 rows=1 loops=50)\n
-> Single-row index lookup on P0 using IS_PROJ_PK
(IS_PROJ_GUID=T0.IS_PROJ_GUID, IS_REPOSITORY_ID=R0.REP_ID)  (cost=0.63
rows=1) (actual time=0.000..0.000 rows=1 loops=50)\n
-> Select #2 (subquery in condition; dependent)\n
-> Limit: 1 row(s)  (cost=5.98 rows=1) (actual time=0.043..0.043
rows=0 loops=517350)\n
-> Filter: (T0.IS_OBJ_GUID)  (cost=5.98 rows=1)
(actual time=0.043..0.043 rows=0 loops=517350)\n
-> Filter: (((IS_SEC_FILT.IS_SEC_FILT_GUID) = T0.IS_OBJ_GUID)
or (T0.IS_OBJ_GUID is null))  (cost=5.98 rows=1) (actual
time=0.042..0.042 rows=0 loops=517350)\n
-> Inner hash join (T0.IS_PROJ_GUID = P0.IS_PROJ_GUID)  (cost=5.98
rows=1) (actual time=0.004..0.038 rows=50 loops=517350)\n
  -> Table scan on T0  (cost=0.35 rows=50) (actual
time=0.001..0.022 rows=50 loops=517350)\n
  -> Hash\n-> Single-row index
lookup on P0 using PRIMARY (IS_PROJ_ID=IS_SEC_FILT.IS_PROJ_ID)
(cost=0.72 rows=1) (actual time=0.001..0.001 rows=1 loops=517350)\n'


On Fri, Sep 20, 2024 at 9:49 AM Adrian Klaver 
wrote:

> On 9/19/24 21:07, Siraj G wrote:
> > Hello Tech gents!
> >
> > I am sorry if I am asking the wrong question to this group, but wanted
> > assistance in converting a query replacing subqueries with joins.
> >
> > Please find the query below (whose cost is very high):
>
> Add the output of the EXPLAIN ANALYZE for the query.
>
> >
> > select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN
> > (SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE
> > T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID =
> > IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM
> > IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID =
> > R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND
> > P0.IS_REPOSITORY_ID = R0.REP_ID);
>
> For future reference formatting the query here:
>
> https://sqlformat.darold.net/
>
> helps get it into a form that is easier to follow:
>
> SELECT
>  em_exists_id
> FROM
>  IS_SEC_FILT
> WHERE (IS_SEC_FILT_GUID)
> NOT IN (
>  SELECT
>  IS_OBJ_GUID
>  FROM
>  TMP_IS_SEC_FILT T0,
>  IS_PROJ P0
>  WHERE
>  T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
>  AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)
> AND (IS_PROJ_ID) IN (
>  SELECT
>  IS_PROJ_ID
>  FROM
>  IS_PROJ P0,
>  TMP_IS_SEC_FILT T0,
>  EM_MD R0
>  WHERE
>  T0.IS_REPOSITORY_GUID = R0.REP_GUID
>  AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
>  AND P0.IS_REPOSITORY_ID = R0.REP_ID);
>
>
> >
> > Regards
> > Siraj
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: Need assistance in converting subqueries to joins

2024-09-19 Thread Adrian Klaver


On 9/19/24 21:07, Siraj G wrote:

Hello Tech gents!

I am sorry if I am asking the wrong question to this group, but wanted 
assistance in converting a query replacing subqueries with joins.


Please find the query below (whose cost is very high):


Add the output of the EXPLAIN ANALYZE for the query.



select em_exists_id from IS_SEC_FILT WHERE (IS_SEC_FILT_GUID) NOT IN 
(SELECT IS_OBJ_GUID FROM TMP_IS_SEC_FILT T0, IS_PROJ P0 WHERE 
T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND P0.IS_PROJ_ID = 
IS_SEC_FILT.IS_PROJ_ID) AND (IS_PROJ_ID) IN (SELECT IS_PROJ_ID FROM 
IS_PROJ P0, TMP_IS_SEC_FILT T0, EM_MD R0 WHERE T0.IS_REPOSITORY_GUID = 
R0.REP_GUID AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID AND 
P0.IS_REPOSITORY_ID = R0.REP_ID);


For future reference formatting the query here:

https://sqlformat.darold.net/

helps get it into a form that is easier to follow:

SELECT
em_exists_id
FROM
IS_SEC_FILT
WHERE (IS_SEC_FILT_GUID)
NOT IN (
SELECT
IS_OBJ_GUID
FROM
TMP_IS_SEC_FILT T0,
IS_PROJ P0
WHERE
T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_PROJ_ID = IS_SEC_FILT.IS_PROJ_ID)
AND (IS_PROJ_ID) IN (
SELECT
IS_PROJ_ID
FROM
IS_PROJ P0,
TMP_IS_SEC_FILT T0,
EM_MD R0
WHERE
T0.IS_REPOSITORY_GUID = R0.REP_GUID
AND T0.IS_PROJ_GUID = P0.IS_PROJ_GUID
AND P0.IS_REPOSITORY_ID = R0.REP_ID);




Regards
Siraj


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: glibc updarte 2.31 to 2.38

Hi Joe,

> On 19 Sep 2024, at 20:09, Joe Conway  wrote:
> 
> See my thread-adjacent email, but suffice to say that if there are collation 
> differences that do affect your tables/data, and you allow any inserts or 
> updates, you may wind up with corrupted data (e.g. duplicate data in your 
> otherwise unique indexes/primary keys).

Yes, I know that.

> For more examples about that see 
> https://joeconway.com/presentations/glibc-SCaLE21x-2024.pdf

A very interesting PDF. Thanks very much.

> An potential alternative for you (discussed at the end of that presentation) 
> would be to create a new branch based on your original SLES 15.5 glibc RPM 
> equivalent to this:
> 
> https://github.com/awslabs/compat-collation-for-glibc/tree/2.17-326.el7
> 
> The is likely a non trivial amount of work involved (the port from the AL2 
> rpm to the RHEL7 rpm took me the better part of a couple of days), but once 
> done your collation is frozen to the specific version you had on 15.5.

I'm not a developer. I have one machine which is equivalent to all other 
servers except that it has gcc, make and some other things for me to build 
PostgreSQL. I can't make the admins run a rpm on all servers. I can obviously 
put a library into the /path/2/postgres/software/lib64 directory but not into 
the system.

Also, my build server does not have internet access. So things like git clone 
would be an additional show stopper. Unfortunately, I'm pretty limited.

Cheers,
Paul

Re: glibc updarte 2.31 to 2.38

Hi Peter,

> On 19 Sep 2024, at 19:43, Peter J. Holzer  wrote:
> 
> I wrote a small script[1] which prints all unicode code points and a few
> selected[2] longer strings in order. If you run that before and after
> the upgrade and the output doesn't change, you are probably be fine.
> (It checks only the default collation, though: If you have indexes using
> a different collation you would have to modify the script accordingly.)
> 
> If there are differences, closer inspection might show that the changes
> don't affect you. But I would reindex all indexes on text (etc.) columns
> just to be sure.
> 
>hp
> 
> [1] https://git.hjp.at:3000/hjp/pgcollate
> [2] The selection is highly subjective and totally unscientific.
>Additions are welcome.

I'm not a Python specialist but I take it that the script need psycopg2, which 
we probably don't have. So I'd have to build some sort of venv around that like 
I had to do to get Patroni working on our systems.

Well, we'll see.

Thanks for this script.

Cheers,
Paul

Re: glibc updarte 2.31 to 2.38

2024-09-19 Thread Joe Conway


On 9/19/24 13:56, Paul Foerster wrote:

On 19 Sep 2024, at 17:14, Tom Lane  wrote:
Maybe.  We don't really track glibc changes, so I can't say for sure,
but it might be advisable to reindex indexes on string columns.



Advisable is a word I undfortunately can't do much with. We have
terabytes and terabytes of data in hundreds of databases each having
potentially hundreds of columns that are candidates. Just reindexing
and taking down applications during that time is not an option in a
24x7 high availability environment.


See my thread-adjacent email, but suffice to say that if there are 
collation differences that do affect your tables/data, and you allow any 
inserts or updates, you may wind up with corrupted data (e.g. duplicate 
data in your otherwise unique indexes/primary keys).


For more examples about that see 
https://joeconway.com/presentations/glibc-SCaLE21x-2024.pdf


An potential alternative for you (discussed at the end of that 
presentation) would be to create a new branch based on your original 
SLES 15.5 glibc RPM equivalent to this:


https://github.com/awslabs/compat-collation-for-glibc/tree/2.17-326.el7

The is likely a non trivial amount of work involved (the port from the 
AL2 rpm to the RHEL7 rpm took me the better part of a couple of days), 
but once done your collation is frozen to the specific version you had 
on 15.5.



--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: glibc updarte 2.31 to 2.38

Hi Joe,

> On 19 Sep 2024, at 19:07, Joe Conway  wrote:
> 
> Every glibc major version change potentially impacts the sorting of some 
> strings, which would require reindexing. Whether your actual data trips into 
> any of these changes is another matter.
> 
> You could check by doing something equivalent to this on every collatable 
> column with an index built on it, in every table:
> 
> 8<---
> WITH t(s) AS (SELECT  FROM  ORDER BY 1)
> SELECT md5(string_agg(t.s, NULL)) FROM t;
> 8<---
> 
> Check the before and after glibc upgrade result -- if it is the same, you are 
> good to go. If not, rebuild the index before *any* DML is done to the table.

I like the neatness of this one. I think about how to implement this on 
hundreds of of databases with hundreds of columns. That'll be a challenge, but 
at least it's a start.

Thanks very much for this one.

Cheers,
Paul

Re: glibc updarte 2.31 to 2.38

Hi Tom,

> On 19 Sep 2024, at 17:14, Tom Lane  wrote:
> 
> No, I wouldn't expect that to be necessary.

I was hoping one of the pros would say that. 🤣

> Maybe.  We don't really track glibc changes, so I can't say for sure,
> but it might be advisable to reindex indexes on string columns.

Advisable is a word I undfortunately can't do much with. We have terabytes and 
terabytes of data in hundreds of databases each having potentially hundreds of 
columns that are candidates. Just reindexing and taking down applications 
during that time is not an option in a 24x7 high availability environment.

Cheer,
Paul

Re: glibc updarte 2.31 to 2.38

Hi Adrian,

> On 19 Sep 2024, at 17:00, Adrian Klaver  wrote:
> 
> I would take a look at:
> 
> https://wiki.postgresql.org/wiki/Locale_data_changes
> 
> It refers to the glibc 2.8 change in particular, but includes some generic 
> tips that could prove useful.
> 
> 
> The glibc change log below might also be useful:
> 
> https://sourceware.org/glibc/wiki/Release

I've seen those before but since the article only refers to 2.28 and SUSE 15.3, 
and I couldn't find anything in the glibc release notes, I thought I'd ask.

Cheers,
Paul

Re: glibc updarte 2.31 to 2.38

2024-09-19 Thread Peter J. Holzer

On 2024-09-19 16:37:41 +0200, Paul Foerster wrote:
> we have SLES 15.5 which has glibc 2.31. Our admin told us that he's
> about to install the SLES 15.6 update which contains glibc 2.38.
> 
> I have built our PostgreSQL software from source on SLES 15.5, because
> we have some special requirements which the packages cannot fulfill.
> So I have questions:
> 
> 1) Do I have to build it again on 15.6?
> 
> 2) Does the glibc update have any impact? I recall having to have
> everything reindexed when the 2.28 update came due to major locale
> changes, but I didn't have to do it since then.
> 
> 3) Where and how can I find out if it is necessary to reindex?

I wrote a small script[1] which prints all unicode code points and a few
selected[2] longer strings in order. If you run that before and after
the upgrade and the output doesn't change, you are probably be fine.
(It checks only the default collation, though: If you have indexes using
a different collation you would have to modify the script accordingly.)

If there are differences, closer inspection might show that the changes
don't affect you. But I would reindex all indexes on text (etc.) columns
just to be sure.

hp

[1] https://git.hjp.at:3000/hjp/pgcollate
[2] The selection is highly subjective and totally unscientific.
Additions are welcome.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"

signature.asc
Description: PGP signature

Re: glibc updarte 2.31 to 2.38

2024-09-19 Thread Joe Conway


On 9/19/24 13:07, Joe Conway wrote:

On 9/19/24 11:14, Tom Lane wrote:

Paul Foerster  writes:

we have SLES 15.5 which has glibc 2.31. Our admin told us that
he's about to install the SLES 15.6 update which contains glibc
2.38.



2) Does the glibc update have any impact?

Maybe.  We don't really track glibc changes, so I can't say for sure,
but it might be advisable to reindex indexes on string columns.



Every glibc major version change potentially impacts the sorting of some
strings, which would require reindexing. Whether your actual data trips
into any of these changes is another matter.

You could check by doing something equivalent to this on every
collatable column with an index built on it, in every table:

8<---
WITH t(s) AS (SELECT  FROM  ORDER BY 1)
   SELECT md5(string_agg(t.s, NULL)) FROM t;
8<---

Check the before and after glibc upgrade result -- if it is the same,
you are good to go. If not, rebuild the index before *any* DML is done
to the table.



... and I should have mentioned that in a similar way, if you have any 
tables that are partitioned by range on collatable columns, the 
partition boundaries potentially are affected. Similarly, constraints 
involving expressions on collatable columns may be affected.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: glibc updarte 2.31 to 2.38

2024-09-19 Thread Joe Conway


On 9/19/24 11:14, Tom Lane wrote:

Paul Foerster  writes:

we have SLES 15.5 which has glibc 2.31. Our admin told us that
he's about to install the SLES 15.6 update which contains glibc
2.38.



2) Does the glibc update have any impact?

Maybe.  We don't really track glibc changes, so I can't say for sure,
but it might be advisable to reindex indexes on string columns.



Every glibc major version change potentially impacts the sorting of some 
strings, which would require reindexing. Whether your actual data trips 
into any of these changes is another matter.


You could check by doing something equivalent to this on every 
collatable column with an index built on it, in every table:


8<---
WITH t(s) AS (SELECT  FROM  ORDER BY 1)
 SELECT md5(string_agg(t.s, NULL)) FROM t;
8<---

Check the before and after glibc upgrade result -- if it is the same, 
you are good to go. If not, rebuild the index before *any* DML is done 
to the table.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: glibc updarte 2.31 to 2.38

2024-09-19 Thread Tom Lane

Paul Foerster  writes:
> we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to 
> install the SLES 15.6 update which contains glibc 2.38.
> I have built our PostgreSQL software from source on SLES 15.5, because we 
> have some special requirements which the packages cannot fulfill. So I have 
> questions:

> 1) Do I have to build it again on 15.6?

No, I wouldn't expect that to be necessary.

> 2) Does the glibc update have any impact?

Maybe.  We don't really track glibc changes, so I can't say for sure,
but it might be advisable to reindex indexes on string columns.

regards, tom lane

Re: IO related waits

2024-09-19 Thread Adrian Klaver


On 9/19/24 05:24, Greg Sabino Mullane wrote:
On Thu, Sep 19, 2024 at 5:17 AM veem v 

This is really difficult to diagnose from afar with only snippets of 
logs and half-complete descriptions of your business logic. Pull 
everyone involved into a room with a whiteboard, and produce a document 
describing exactly what your application does, and how it is doing it. 
Switch from reactive to proactive.


+1



Cheers,
Greg



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: glibc updarte 2.31 to 2.38

2024-09-19 Thread Adrian Klaver


On 9/19/24 07:37, Paul Foerster wrote:

Hi,

we have SLES 15.5 which has glibc 2.31. Our admin told us that he's about to 
install the SLES 15.6 update which contains glibc 2.38.

I have built our PostgreSQL software from source on SLES 15.5, because we have 
some special requirements which the packages cannot fulfill. So I have 
questions:

1) Do I have to build it again on 15.6?

2) Does the glibc update have any impact? I recall having to have everything 
reindexed when the 2.28 update came due to major locale changes, but I didn't 
have to do it since then.

3) Where and how can I find out if it is necessary to reindex? And how can I 
find out what indexes would be affected.

I'd really appreciate your comments. Thanks very much in advance.


I would take a look at:

https://wiki.postgresql.org/wiki/Locale_data_changes

It refers to the glibc 2.8 change in particular, but includes some 
generic tips that could prove useful.



The glibc change log below might also be useful:

https://sourceware.org/glibc/wiki/Release



Paul



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: How batch processing works

2024-09-19 Thread Lok P

On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson  wrote:

> On Thu, Sep 19, 2024 at 5:24 AM Lok P  wrote:
>
>>
>>
>> [snip]
>
>> DO $$
>> DECLARE
>> num_inserts INTEGER := 10;
>> batch_size INTEGER := 50;
>> start_time TIMESTAMP;
>> end_time TIMESTAMP;
>> elapsed_time INTERVAL;
>> i INTEGER;
>> BEGIN
>> -- Method 1: Individual Inserts with Commit after every Row
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..num_inserts LOOP
>> INSERT INTO parent_table VALUES (i, 'a');
>> COMMIT;
>> END LOOP;
>>
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 1: Individual Inserts with Commit after every Row',
>> start_time, end_time, elapsed_time);
>>
>> -- Method 2: Individual Inserts with Commit after 100 Rows
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..num_inserts LOOP
>> INSERT INTO parent_table2 VALUES (i, 'a');
>> -- Commit after every 100 rows
>> IF i % batch_size = 0 THEN
>> COMMIT;
>> END IF;
>> END LOOP;
>>
>> -- Final commit if not already committed
>>commit;
>>
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
>> start_time, end_time, elapsed_time);
>>
>> -- Method 3: Batch Inserts with Commit after all
>> start_time := clock_timestamp();
>>
>> FOR i IN 1..(num_inserts / batch_size) LOOP
>> INSERT INTO parent_table3 VALUES
>> (1 + (i - 1) * batch_size, 'a'),
>>
> [snip]
>
>> (49 + (i - 1) * batch_size, 'a'),
>> (50 + (i - 1) * batch_size, 'a'));
>> COMMIT;
>> END LOOP;
>>
>> COMMIT;  -- Final commit for all
>> end_time := clock_timestamp();
>> elapsed_time := end_time - start_time;
>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
>> end_time, elapsed_time);
>>
>> END $$;
>>
>
> Reproduce what behavior?
>
> Anyway, plpgsql functions (including anonymous DO statements) are -- to
> Postgresql -- single statements.  Thus, they'll be faster than
> individual calls..
>
> An untrusted language like plpython3u might speed things up even more, if
> you have to read a heterogeneous external file and insert all the records
> into the db.
>

Here if you see my script , the method-1 is doing commit after each row
insert. And method-2 is doing a batch commit i.e. commit after every "50"
row. And method-3 is doing a true batch insert i.e. combining all the 50
values in one insert statement and submitting to the database in oneshot
and then COMMIT it, so the context switching will be a lot less. So I was
expecting Method-3 to be the fastest way to insert the rows here, but the
response time shows the same response time for Method-2 and method-3.
Method-1 is the slowest through.

Re: IO related waits

2024-09-19 Thread veem v

On Thu, 19 Sept 2024 at 17:54, Greg Sabino Mullane 
wrote:

> On Thu, Sep 19, 2024 at 5:17 AM veem v  wrote:
>
>> 2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL:
>>  Process 14537 waits for ShareLock on transaction 220975629; blocked by
>> process 14548.
>>
>
> You need to find out exactly what commands, and in what order, all these
> processes are doing. Deadlocks can be avoided by rearranging your
> application logic.
>
>
>> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:
>>  current transaction is aborted, commands ignored until end of transaction
>> block
>> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:
>>  INSERT INTO TRANDB.EXCEP_TAB (...)
>> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:
>>  current transaction is aborted, commands ignored until end of transaction
>> block
>> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:
>>
>> 2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR:
>>  current transaction is aborted, commands ignored until end of transaction
>> block
>>
>
> Fix your application. It should be checking that each command completed
> and not just blindly pushing on to the next statement while ignoring the
> error.
>
> This is really difficult to diagnose from afar with only snippets of logs
> and half-complete descriptions of your business logic. Pull everyone
> involved into a room with a whiteboard, and produce a document describing
> exactly what your application does, and how it is doing it. Switch from
> reactive to proactive.
>
>
>
Thank you Greg.

I was thinking there might be some oddity or difference in the behaviour
here in postgres as compared to others, because I have seen deadlock due to
UPDATES but never seen deadlocks with INSERT queries before in other
databases (though here we have "insert on conflict do nothing"). But I am
now thinking , here we have foreign keys and primary keys exist and if the
same PK gets inserted from multiple sessions then one will wait if the
other has not been committed and that might be creating a situation of
locking first and subsequently deadlock.

But also we are doing batch inserts from multiple sessions but each session
will first insert into the parent and then into the child table for those
related to PK and FK and it should not overlap across sessions. So I will
check if there is a loophole there.

Also another thing which we encountered here , if the session gets errors
out with any error(may be deadlock etc) , it's not executing any further
transactions and erroring out with "*current transaction aborted, command
ignored until end of transaction block*". And it seems it will need
explicit "rollback" and will not be the default rollback, which I was
expecting it to do.

Regards
Veem

Re: IO related waits

2024-09-19 Thread Greg Sabino Mullane

On Thu, Sep 19, 2024 at 5:17 AM veem v  wrote:

> 2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL:
>  Process 14537 waits for ShareLock on transaction 220975629; blocked by
> process 14548.
>

You need to find out exactly what commands, and in what order, all these
processes are doing. Deadlocks can be avoided by rearranging your
application logic.

> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:
>  current transaction is aborted, commands ignored until end of transaction
> block
> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:
>  INSERT INTO TRANDB.EXCEP_TAB (...)
> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:
>  current transaction is aborted, commands ignored until end of transaction
> block
> 2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:
>
> 2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR:
>  current transaction is aborted, commands ignored until end of transaction
> block
>

Fix your application. It should be checking that each command completed and
not just blindly pushing on to the next statement while ignoring the error.

This is really difficult to diagnose from afar with only snippets of logs
and half-complete descriptions of your business logic. Pull everyone
involved into a room with a whiteboard, and produce a document describing
exactly what your application does, and how it is doing it. Switch from
reactive to proactive.

Cheers,
Greg

Re: How batch processing works

2024-09-19 Thread Ron Johnson

On Thu, Sep 19, 2024 at 5:24 AM Lok P  wrote:

>
> On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson 
> wrote:
>
>>
>> [snip]
>>
>>>
>>> Method-4
>>>
>>> INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
>>> INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
>>> commit;
>>>
>>
>> If I knew that I had to load a structured input data file (even if it had
>> parent and child records), this is how I'd do it (but probably first try
>> and see if "in-memory COPY INTO" is such a thing).
>>
>>
>
> I was trying to reproduce this behaviour using row by row commit vs just
> batch commit vs true batch insert as you mentioned, i am not able to see
> any difference between "batch commit" and "true batch insert" response. Am
> I missing anything?
>
> [snip]

> DO $$
> DECLARE
> num_inserts INTEGER := 10;
> batch_size INTEGER := 50;
> start_time TIMESTAMP;
> end_time TIMESTAMP;
> elapsed_time INTERVAL;
> i INTEGER;
> BEGIN
> -- Method 1: Individual Inserts with Commit after every Row
> start_time := clock_timestamp();
>
> FOR i IN 1..num_inserts LOOP
> INSERT INTO parent_table VALUES (i, 'a');
> COMMIT;
> END LOOP;
>
> end_time := clock_timestamp();
> elapsed_time := end_time - start_time;
> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
> VALUES ('Method 1: Individual Inserts with Commit after every Row',
> start_time, end_time, elapsed_time);
>
> -- Method 2: Individual Inserts with Commit after 100 Rows
> start_time := clock_timestamp();
>
> FOR i IN 1..num_inserts LOOP
> INSERT INTO parent_table2 VALUES (i, 'a');
> -- Commit after every 100 rows
> IF i % batch_size = 0 THEN
> COMMIT;
> END IF;
> END LOOP;
>
> -- Final commit if not already committed
>commit;
>
> end_time := clock_timestamp();
> elapsed_time := end_time - start_time;
> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
> start_time, end_time, elapsed_time);
>
> -- Method 3: Batch Inserts with Commit after all
> start_time := clock_timestamp();
>
> FOR i IN 1..(num_inserts / batch_size) LOOP
> INSERT INTO parent_table3 VALUES
> (1 + (i - 1) * batch_size, 'a'),
>
[snip]

> (49 + (i - 1) * batch_size, 'a'),
> (50 + (i - 1) * batch_size, 'a'));
> COMMIT;
> END LOOP;
>
> COMMIT;  -- Final commit for all
> end_time := clock_timestamp();
> elapsed_time := end_time - start_time;
> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
> VALUES ('Method 3: Batch Inserts with Commit after All', start_time,
> end_time, elapsed_time);
>
> END $$;
>

Reproduce what behavior?

Anyway, plpgsql functions (including anonymous DO statements) are -- to
Postgresql -- single statements.  Thus, they'll be faster than
individual calls..

An untrusted language like plpython3u might speed things up even more, if
you have to read a heterogeneous external file and insert all the records
into the db.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: How batch processing works

2024-09-19 Thread Lok P

On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson 
wrote:

>
> [snip]
>
>>
>> Method-4
>>
>> INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
>> INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
>> commit;
>>
>
> If I knew that I had to load a structured input data file (even if it had
> parent and child records), this is how I'd do it (but probably first try
> and see if "in-memory COPY INTO" is such a thing).
>
>

I was trying to reproduce this behaviour using row by row commit vs just
batch commit vs true batch insert as you mentioned, i am not able to see
any difference between "batch commit" and "true batch insert" response. Am
I missing anything?

CREATE TABLE debug_log (
method1 TEXT,
start_time TIMESTAMP,
end_time TIMESTAMP,
elapsed_time INTERVAL
);

CREATE TABLE parent_table (
id SERIAL PRIMARY KEY,
name TEXT
);


CREATE TABLE parent_table2 (
id SERIAL PRIMARY KEY,
name TEXT
);


CREATE TABLE parent_table3 (
id SERIAL PRIMARY KEY,
name TEXT
);
DO $$
DECLARE
num_inserts INTEGER := 10;
batch_size INTEGER := 50;
start_time TIMESTAMP;
end_time TIMESTAMP;
elapsed_time INTERVAL;
i INTEGER;
BEGIN
-- Method 1: Individual Inserts with Commit after every Row
start_time := clock_timestamp();

FOR i IN 1..num_inserts LOOP
INSERT INTO parent_table VALUES (i, 'a');
COMMIT;
END LOOP;

end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 1: Individual Inserts with Commit after every Row',
start_time, end_time, elapsed_time);

-- Method 2: Individual Inserts with Commit after 100 Rows
start_time := clock_timestamp();

FOR i IN 1..num_inserts LOOP
INSERT INTO parent_table2 VALUES (i, 'a');
-- Commit after every 100 rows
IF i % batch_size = 0 THEN
COMMIT;
END IF;
END LOOP;

-- Final commit if not already committed
   commit;

end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 2: Individual Inserts with Commit after 100 Rows',
start_time, end_time, elapsed_time);

-- Method 3: Batch Inserts with Commit after all
start_time := clock_timestamp();

FOR i IN 1..(num_inserts / batch_size) LOOP
INSERT INTO parent_table3 VALUES
(1 + (i - 1) * batch_size, 'a'),
(2 + (i - 1) * batch_size, 'a'),
(3 + (i - 1) * batch_size, 'a'),
(4 + (i - 1) * batch_size, 'a'),
(5 + (i - 1) * batch_size, 'a'),
(6 + (i - 1) * batch_size, 'a'),
(7 + (i - 1) * batch_size, 'a'),
(8 + (i - 1) * batch_size, 'a'),
(9 + (i - 1) * batch_size, 'a'),
(10 + (i - 1) * batch_size, 'a'),
(11 + (i - 1) * batch_size, 'a'),
(12 + (i - 1) * batch_size, 'a'),
(13 + (i - 1) * batch_size, 'a'),
(14 + (i - 1) * batch_size, 'a'),
(15 + (i - 1) * batch_size, 'a'),
(16 + (i - 1) * batch_size, 'a'),
(17 + (i - 1) * batch_size, 'a'),
(18 + (i - 1) * batch_size, 'a'),
(19 + (i - 1) * batch_size, 'a'),
(20 + (i - 1) * batch_size, 'a'),
(21 + (i - 1) * batch_size, 'a'),
(22 + (i - 1) * batch_size, 'a'),
(23 + (i - 1) * batch_size, 'a'),
(24 + (i - 1) * batch_size, 'a'),
(25 + (i - 1) * batch_size, 'a'),
(26 + (i - 1) * batch_size, 'a'),
(27 + (i - 1) * batch_size, 'a'),
(28 + (i - 1) * batch_size, 'a'),
(29 + (i - 1) * batch_size, 'a'),
(30 + (i - 1) * batch_size, 'a'),
(31 + (i - 1) * batch_size, 'a'),
(32 + (i - 1) * batch_size, 'a'),
(33 + (i - 1) * batch_size, 'a'),
(34 + (i - 1) * batch_size, 'a'),
(35 + (i - 1) * batch_size, 'a'),
(36 + (i - 1) * batch_size, 'a'),
(37 + (i - 1) * batch_size, 'a'),
(38 + (i - 1) * batch_size, 'a'),
(39 + (i - 1) * batch_size, 'a'),
(40 + (i - 1) * batch_size, 'a'),
(41 + (i - 1) * batch_size, 'a'),
(42 + (i - 1) * batch_size, 'a'),
(43 + (i - 1) * batch_size, 'a'),
(44 + (i - 1) * batch_size, 'a'),
(45 + (i - 1) * batch_size, 'a'),
(46 + (i - 1) * batch_size, 'a'),
(47 + (i - 1) * batch_size, 'a'),
(48 + (i - 1) * batch_size, 'a'),
(49 + (i - 1) * batch_size, 'a'),
(50 + (i - 1) * batch_size, 'a'));
COMMIT;
END LOOP;

COMMIT;  -- Final commit for all
end_time := clock_timestamp();
elapsed_time := end_time - start_time;
INSERT INTO debug_log (method1, start_time, end_time, elapsed_time)
VALUES ('Method 3: Ba

Re: IO related waits

2024-09-19 Thread veem v

On Thu, 19 Sept 2024 at 03:02, Adrian Klaver 
wrote:

>
>
> This needs clarification.
>
> 1) To be clear when you refer to parent and child that is:
>  FK
> parent_tbl.fld <--> child_tbl.fld_fk
>
> not parent and child tables in partitioning scheme?
>
> 2) What are the table schemas?
>
> 3) What is the code that is generating the error?
>
>
> Overall it looks like this process needs a top to bottom audit to map
> out what is actually being done versus what needs to be done.
>
>
>
Yes the tables were actually having parent and child table relationships,
not the child/parent table in partitioning scheme.  And the PK and FK are
on columns - (ID, part_date) .The table is the daily range partitioned on
column part_date.

*The error we are seeing is as below in logs:-*

deadlock detected
2024-09-18 17:05:56 UTC:100.72.10.66(54582):USER1@TRANDB:[14537]:DETAIL:
 Process 14537 waits for ShareLock on transaction 220975629; blocked by
process 14548.
Process 14548 waits for ShareLock on transaction 220975630; blocked by
process 14537.
Process 14537: INSERT INTO TRANDB.PART_TAB (ID, part_date)  VALUES ($1,
$2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID,
part_date) DO NOTHING
Process 14548: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...)  VALUES
($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT
(ID, part_date) DO NOTHING

2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:HINT:  See
server log for query details.
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:CONTEXT:
 while inserting index tuple (88814,39) in relation "PART_TAB_p2024_08_29"

2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:
 INSERT INTO TRANDB.PART_TAB (ID, part_date, ...)  VALUES ($1, $2, $3, $4,
$5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO
NOTHING

2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:
 current transaction is aborted, commands ignored until end of transaction
block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:
 INSERT INTO TRANDB.EXCEP_TAB (...)
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:ERROR:
 current transaction is aborted, commands ignored until end of transaction
block
2024-09-18 17:05:56 UTC:100.72.22.33(54582):USER1@TRANDB:[14537]:STATEMENT:

2024-09-18 17:05:56 UTC:100.72.22.33(36096):USER1@TRANDB:[14551]:ERROR:
 current transaction is aborted, commands ignored until end of transaction
block

*

2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:ERROR:
 deadlock detected
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:DETAIL:
 Process 17456 waits for ShareLock on transaction 220978890; blocked by
process 17458.
Process 17458 waits for ShareLock on transaction 220978889; blocked by
process 17456.
Process 17456: INSERT INTO TRANDB.PART_TAB (ID, part_date, ...)  VALUES
($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT
(ID, part_date) DO NOTHING
Process 17458: INSERT INTO TRANDB.PART_TAB (ID, part_date, ..)  VALUES ($1,
$2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID,
part_date) DO NOTHING
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:HINT:  See
server log for query details.
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:CONTEXT:
 while inserting index tuple (88875,13) in relation "PART_TAB_p2024_08_29"
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:STATEMENT:
 INSERT INTO TRANDB.PART_TAB (ID, part_date,..)  VALUES ($1, $2, $3, $4,
$5, $6, $7, $8, $9, $10, $11, $12, $13, $14) ON CONFLICT (ID, part_date) DO
NOTHING
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:LOG:
 could not receive data from client: Connection reset by peer
2024-09-18 17:41:05 UTC:100.72.39.47(58778):USER1@TRANDB:[17456]:LOG:
 disconnection: session time: 0:08:37.154 user=USER1 database=TRANDB
host=X port=58778

Re: How batch processing works

2024-09-18 Thread Ron Johnson

On Thu, Sep 19, 2024 at 1:31 AM Lok P  wrote:

> Hello,
> Saw multiple threads around the same , so I want some clarification. As we
> know row by row is slow by slow processing , so in heavy write systems(say
> the client app is in Java) , people asked to do DMLS in batches rather in a
> row by row fashion to minimize the chatting or context switches between
> database and client which is resource intensive. What I understand is that
> , a true batch processing means the client has to collect all the input
> bind values and  prepare the insert statement and submit to the database at
> oneshot and then commit.
>
> What it means actually and if we divide the option as below, which method
> truly does batch processing or there exists some other method for doing the
> batch processing considering postgres as backend database?
>
> I understand, the first method below is truly a row by row processing in
> which context switches happen between client and database with each row,
> whereas the second method is just batching the commits but not a true batch
> processing as it will do the same amount of context switching between the
> database and client. But regarding the third and fourth method, will both
> execute in a similar fashion in the database with the same number of
> context switches? If any other better method exists to do these inserts in
> batches? Appreciate your guidance.
>
>
> CREATE TABLE parent_table (
> id SERIAL PRIMARY KEY,
> name TEXT
> );
>
> CREATE TABLE child_table (
> id SERIAL PRIMARY KEY,
> parent_id INT REFERENCES parent_table(id),
> value TEXT
> );
>
>
> Method-1
>
> insert into parent_table values(1,'a');
> commit;
> insert into parent_table values(2,'a');
> commit;
> insert into child_table values(1,1,'a');
> Commit;
> insert into child_table values(1,2,'a');
> commit;
>
> VS
>
> Method-2
>
> insert into parent_table values(1,'a');
> insert into parent_table values(2,'a');
> insert into child_table values(1,1,'a');
> insert into child_table values(1,2,'a');
> Commit;
>

As a former "DP" programmer, from an application point of view, this is
absolutely batch programming.

My experience was with COBOL and C, though, which were low overhead.  From
what I've seen in PG log files, JDBC is astoundingly chatty.

[snip]

>
> Method-4
>
> INSERT INTO parent_table VALUES  (1, 'a'), (2, 'a');
> INSERT INTO child_table VALUES   (1,1, 'a'), (1,2, 'a');
> commit;
>

If I knew that I had to load a structured input data file (even if it had
parent and child records), this is how I'd do it (but probably first try
and see if "in-memory COPY INTO" is such a thing).

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: CREATE DATABASE command concurrency

2024-09-18 Thread Muhammad Usman Khan

Hi,
In PostgreSQL, it's safe to run CREATE DATABASE at the same time from
different places. If two commands try to create the same database, one will
succeed, and the other will safely fail without causing any problems or
incomplete database creation.

On Wed, 18 Sept 2024 at 19:08, Wizard Brony  wrote:

> What are the concurrency guarantees of the CREATE DATABASE command? For
> example, is the CREATE DATABASE command safe to be called concurrently such
> that one command succeeds and the other reliably fails without corruption?
>
>
>

Re: IO related waits

2024-09-18 Thread Adrian Klaver

On 9/18/24 1:40 PM, veem v wrote:

You were spot on. When we turned off the "auto commit" we started
seeing less number of commits as per the number of batches.

However we also started seeing deadlock issues. We have foreign key
relationships between the tables and during the batch we do insert
into the parent first and then to the child , but this does happen
from multiple sessions for different batches. So why do we see below
error, as we ensure in each batch we first insert into parent and
then into the child tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 10443 waits for ShareLock on transaction
220972157; blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked
by process 10443.
Hint: See server log for query details.
Where: while inserting index tuple (88736,28) in relation
"TAB1_p2024_08_29"

As we are able to get hold of one session, we see "insert into partition table>" was blocked by "insert into ".
And the "insert into " was experiencing a
"client read" wait event. Still unable to understand why it's happening
and how to fix it?

This needs clarification.

1) To be clear when you refer to parent and child that is:
FK
parent_tbl.fld <--> child_tbl.fld_fk

not parent and child tables in partitioning scheme?

2) What are the table schemas?

3) What is the code that is generating the error?

Overall it looks like this process needs a top to bottom audit to map
out what is actually being done versus what needs to be done.

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: IO related waits

2024-09-18 Thread veem v

On Thu, 19 Sept 2024 at 02:01, veem v  wrote:

>
> On Wed, 18 Sept 2024 at 05:07, Adrian Klaver 
> wrote:
>
>> On 9/17/24 12:34, veem v wrote:
>> >
>>
>> It does if autocommit is set in the client, that is common to other
>> databases also:
>>
>> https://dev.mysql.com/doc/refman/8.4/en/commit.html
>>
>>
>> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html
>>
>>
>> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16
>>
>> You probably need to take a closer look at the client/driver you are
>> using and the code that interacting with it.
>>
>> In fact I would say you need to review the entire data transfer process
>> to see if there are performance gains that can be obtained without
>> adding an entirely new async component.
>>
>>
>>
> You were spot on. When we turned off the "auto commit" we started seeing
> less number of commits as per the number of batches.
>
> However we also started seeing deadlock issues. We have foreign key
> relationships between the tables and during the batch we do insert into the
> parent first and then to the child , but this does happen from multiple
> sessions for different batches. So why do we see below error, as we
> ensure in each batch we first insert into parent and then into the child
> tables?
>
> caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
>   Detail: Process 10443 waits for ShareLock on transaction 220972157;
> blocked by process 10454.
> Process 10454 waits for ShareLock on transaction 220972155; blocked by
> process 10443.
>   Hint: See server log for query details.
>   Where: while inserting index tuple (88736,28) in relation
> "TAB1_p2024_08_29"
>

As we are able to get hold of one session, we see "insert into " was blocked by "insert into ". And
the "insert into   " was experiencing a "client
read" wait event. Still unable to understand why it's happening and how to
fix it?

Re: IO related waits

2024-09-18 Thread veem v

On Wed, 18 Sept 2024 at 05:07, Adrian Klaver 
wrote:

> On 9/17/24 12:34, veem v wrote:
> >
>
> It does if autocommit is set in the client, that is common to other
> databases also:
>
> https://dev.mysql.com/doc/refman/8.4/en/commit.html
>
>
> https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html
>
>
> https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16
>
> You probably need to take a closer look at the client/driver you are
> using and the code that interacting with it.
>
> In fact I would say you need to review the entire data transfer process
> to see if there are performance gains that can be obtained without
> adding an entirely new async component.
>
>
>
You were spot on. When we turned off the "auto commit" we started seeing
less number of commits as per the number of batches.

However we also started seeing deadlock issues. We have foreign key
relationships between the tables and during the batch we do insert into the
parent first and then to the child , but this does happen from multiple
sessions for different batches. So why do we see below error, as we
ensure in each batch we first insert into parent and then into the child
tables?

caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 10443 waits for ShareLock on transaction 220972157;
blocked by process 10454.
Process 10454 waits for ShareLock on transaction 220972155; blocked by
process 10443.
  Hint: See server log for query details.
  Where: while inserting index tuple (88736,28) in relation
"TAB1_p2024_08_29"

Re: CREATE DATABASE command concurrency

2024-09-18 Thread Tom Lane

Christophe Pettus  writes:
>> On Sep 17, 2024, at 14:52, Wizard Brony  wrote:
>> What are the concurrency guarantees of the CREATE DATABASE command? For 
>> example, is the CREATE DATABASE command safe to be called concurrently such 
>> that one command succeeds and the other reliably fails without corruption?

> The concern is that two different sessions issue a CREATE DATABASE command 
> using the same name?  In that case, it can be relied upon that one will 
> succeed (unless it fails for some other reason, like lacking permissions), 
> and the other will receive an error that the database already exists.

This is true, but it's possibly worth noting that the specific error
message you get could vary.  Normally it'd be something like

regression=# create database postgres;
ERROR:  database "postgres" already exists

but in a race condition it might look more like "duplicate key value
violates unique constraint".  In the end we rely on the system
catalogs' unique indexes to detect and prevent race conditions of
this sort.

regards, tom lane

Re: load fom csv

2024-09-18 Thread Adrian Klaver

On 9/18/24 06:29, Rob Sargent wrote:

On Sep 18, 2024, at 6:39 AM, Andy Hartman  wrote:

psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand

I'm wondering if it's waiting on P/w ?

In a previous post I suggested:

"
To work through this you need to try what I call the crawl/walk/run 
process. In this case that is:

1) Crawl. Connect using psql and run the \copy in it with hard coded values.

2) Walk. Use psql with the -c argument and supply the command again with
hard coded values

3) Run. Then use PowerShell and do the variable substitution.
"

Did you do this with the same command at each stage? If so at either 1) 
or 2) where you asked for a password?

In a later posted I asked:

"1) Are you logging connections/disconnection per?:

https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

If not do so as it will show you if a connection is being attempted.
"

Did you enable connection logging?

Did you look at the Postgres log?

If both the answers are yes you should see something like the below in 
case of password authentication:

2024-09-18 07:47:38.692 PDT [8090] [unknown]@[unknown] LOG:  connection 
received: host=127.0.0.1 port=44840
2024-09-18 07:47:42.410 PDT [8095] [unknown]@[unknown] LOG:  connection 
received: host=127.0.0.1 port=44848
2024-09-18 07:47:42.414 PDT [8095] aklaver@test LOG:  connection 
authenticated: identity="aklaver" method=md5 
(/etc/postgresql/16/main/pg_hba.conf:125)
2024-09-18 07:47:42.414 PDT [8095] aklaver@test LOG:  connection 
authorized: user=aklaver database=test application_name=psql SSL enabled 
(protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256)

Thanks.

Very likely.  Can you show the authentication
mechanisms used (pg_hba)?

On Tue, Sep 17, 2024 at 7:10 PM Andy Hartman > wrote:

I'll echo vars and see if something looks strange.

THanks.

On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent mailto:robjsarg...@gmail.com>> wrote:

> On Sep 17, 2024, at 12:25 PM, Adrian Klaver
mailto:adrian.kla...@aklaver.com>>
wrote:
>
> On 9/17/24 09:21, Andy Hartman wrote:
>> The command work outside of powershell  yes
>
> Then you are going to need to use whatever debugging tools
PowerShell has available to step through the script to figure
out where the problem is.
>
>

Visual Studio can run/debug PS 1 files. I am not at my desk
but have done ps1 oneliner queries against mssql

Suggest echoing ALL vars used in psql command

>
>

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: CREATE DATABASE command concurrency

2024-09-18 Thread Christophe Pettus

> On Sep 17, 2024, at 14:52, Wizard Brony  wrote:
> 
> What are the concurrency guarantees of the CREATE DATABASE command? For 
> example, is the CREATE DATABASE command safe to be called concurrently such 
> that one command succeeds and the other reliably fails without corruption?

The concern is that two different sessions issue a CREATE DATABASE command 
using the same name?  In that case, it can be relied upon that one will succeed 
(unless it fails for some other reason, like lacking permissions), and the 
other will receive an error that the database already exists.

Re: load fom csv

2024-09-18 Thread Rob Sargent

On Sep 18, 2024, at 6:39 AM, Andy Hartman  wrote:psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand I'm wondering if it's waiting on P/w ?Thanks.Very likely.  Can you show the authentication mechanisms used (pg_hba)?On Tue, Sep 17, 2024 at 7:10 PM Andy Hartman  wrote:I'll echo vars and see if something looks strange.THanks.On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent  wrote:

> On Sep 17, 2024, at 12:25 PM, Adrian Klaver  wrote:
> 
> On 9/17/24 09:21, Andy Hartman wrote:
>> The command work outside of powershell  yes
> 
> Then you are going to need to use whatever debugging tools PowerShell has available to step through the script to figure out where the problem is.
> 
> 

Visual Studio can run/debug PS 1 files. I am not at my desk but have done ps1 oneliner queries against mssql

Suggest echoing ALL vars used in psql command

> 
>

Re: load fom csv

2024-09-18 Thread Andy Hartman

psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand

I'm wondering if it's waiting on P/w ?

Thanks.

On Tue, Sep 17, 2024 at 7:10 PM Andy Hartman 
wrote:

> I'll echo vars and see if something looks strange.
>
> THanks.
>
> On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent  wrote:
>
>>
>>
>> > On Sep 17, 2024, at 12:25 PM, Adrian Klaver 
>> wrote:
>> >
>> > On 9/17/24 09:21, Andy Hartman wrote:
>> >> The command work outside of powershell  yes
>> >
>> > Then you are going to need to use whatever debugging tools PowerShell
>> has available to step through the script to figure out where the problem is.
>> >
>> >
>>
>> Visual Studio can run/debug PS 1 files. I am not at my desk but have done
>> ps1 oneliner queries against mssql
>>
>> Suggest echoing ALL vars used in psql command
>>
>> >
>> >
>>
>

Re: IO related waits


On 9/17/24 12:34, veem v wrote:


On Tue, 17 Sept 2024 at 21:24, Adrian Klaver > wrote:



Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple
synchronous clients.

On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html


That will return a success signal to the client quicker if
synchronous_commit is set to off. Though the point of the Flink async
I/O is not to wait for the response before moving on, so I am not sure
how much synchronous_commit = off would help.


  Got it. So it means their suggestion was to set the asynch_io at flink 
level but not DB level, so that the application will not wait for the 
commit response from the database. But in that case , won't it overload 
the DB with more and more requests if database will keep doing the 
commit ( with synchronous_commit=ON)  and waiting for getting the 
response back from its storage for the WAL's to be flushed to the disk, 
while the application will not wait for its response back(for those 
inserts) and keep flooding the database with more and more incoming 
Insert requests?


My point is this is a multi-layer cake with layers:

1) Flink asycnc io

2) Database client async/sync

3) Postgres sync status.

That is a lot of moving parts and determining whether it is suitable is 
going to require rigorous testing over a representative data load.



See more below.



Additionally as I mentioned before, we see that from "pg_stat_database" 
from the column "xact_commit" , it's almost matching with the sum of 
"tup_inserted", "tup_updated", "tup_deleted" column. And also we 
verified in pg_stats_statements the  "calls" column is same as the 
"rows" column for the INSERT queries, so it means also we are inserting 
exactly same number of rows as the number of DB calls, so doesn't it 
suggest that we are doing row by row operations/dmls.


Also after seeing above and asking application team to do the batch 
commit ,we are still seeing the similar figures from pg_stat_database 
and pg_stat_statements, so does it mean that we are looking into wrong 
stats? or the application code change has not been done accurately? and 
we see even when no inserts are running from the application side, we do 
see "xact_commit" keep increasing along with "tup_fetched" , why so?


Finally we see in postgres here, even if we just write a DML statement 
it does commit that by default, until we explicitly put it in a 
"begin... end" block. Can that be the difference between how a "commit" 
gets handled in postgres vs other databases?


It does if autocommit is set in the client, that is common to other 
databases also:


https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-enabling-auto-commit.html

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=sql-server-ver16

You probably need to take a closer look at the client/driver you are 
using and the code that interacting with it.


In fact I would say you need to review the entire data transfer process 
to see if there are performance gains that can be obtained without 
adding an entirely new async component.







--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

I'll echo vars and see if something looks strange.

THanks.

On Tue, Sep 17, 2024 at 3:46 PM Rob Sargent  wrote:

>
>
> > On Sep 17, 2024, at 12:25 PM, Adrian Klaver 
> wrote:
> >
> > On 9/17/24 09:21, Andy Hartman wrote:
> >> The command work outside of powershell  yes
> >
> > Then you are going to need to use whatever debugging tools PowerShell
> has available to step through the script to figure out where the problem is.
> >
> >
>
> Visual Studio can run/debug PS 1 files. I am not at my desk but have done
> ps1 oneliner queries against mssql
>
> Suggest echoing ALL vars used in psql command
>
> >
> >
>

Re: load fom csv

2024-09-17 Thread Rob Sargent




> On Sep 17, 2024, at 12:25 PM, Adrian Klaver  wrote:
> 
> On 9/17/24 09:21, Andy Hartman wrote:
>> The command work outside of powershell  yes
> 
> Then you are going to need to use whatever debugging tools PowerShell has 
> available to step through the script to figure out where the problem is.
> 
> 

Visual Studio can run/debug PS 1 files. I am not at my desk but have done ps1 
oneliner queries against mssql

Suggest echoing ALL vars used in psql command

> 
>

Re: IO related waits

2024-09-17 Thread veem v

On Tue, 17 Sept 2024 at 21:24, Adrian Klaver 
wrote:

>
> Which means you need to on Flink end:
>
> 1) Use Flink async I/O .
>
> 2) Find a client that supports async or fake it by using multiple
> synchronous clients.
>
> On Postgres end there is this:
>
> https://www.postgresql.org/docs/current/wal-async-commit.html
>
> That will return a success signal to the client quicker if
> synchronous_commit is set to off. Though the point of the Flink async
> I/O is not to wait for the response before moving on, so I am not sure
> how much synchronous_commit = off would help.
>
>
 Got it. So it means their suggestion was to set the asynch_io at flink
level but not DB level, so that the application will not wait for the
commit response from the database. But in that case , won't it overload the
DB with more and more requests if database will keep doing the commit (
with synchronous_commit=ON)  and waiting for getting the response back from
its storage for the WAL's to be flushed to the disk, while the application
will not wait for its response back(for those inserts) and keep flooding
the database with more and more incoming Insert requests?

Additionally as I mentioned before, we see that from "pg_stat_database"
from the column "xact_commit" , it's almost matching with the sum of
"tup_inserted", "tup_updated", "tup_deleted" column. And also we verified
in pg_stats_statements the  "calls" column is same as the "rows" column for
the INSERT queries, so it means also we are inserting exactly same number
of rows as the number of DB calls, so doesn't it suggest that we are
doing row by row operations/dmls.

Also after seeing above and asking application team to do the batch commit
,we are still seeing the similar figures from pg_stat_database and
pg_stat_statements, so does it mean that we are looking into wrong stats?
or the application code change has not been done accurately? and we see
even when no inserts are running from the application side, we do see
"xact_commit" keep increasing along with "tup_fetched" , why so?

Finally we see in postgres here, even if we just write a DML statement it
does commit that by default, until we explicitly put it in a "begin... end"
block. Can that be the difference between how a "commit" gets handled in
postgres vs other databases?

Re: Failing to allocate memory when I think it shouldn't

2024-09-17 Thread Christoph Moench-Tegeder

Hi,

## Thomas Ziegler (thomas.zieg...@holmsecurity.com):

> Except for pgAudit, I don't have any extensions, so it is probably the
> JIT. I had no idea there was a JIT, even it should have been obvious.
> Thanks for pointing this out!

There is - it even has it's own chapter in the documentation:
https://www.postgresql.org/docs/current/jit.html
Most importantly, you can disable JIT per session ("SET jit=off")
or globally in the configuration file (jit=off, reload is
sufficient) or with any of the other usual configuration mechanisms.
If that fixes your problem, congratulations (and the problem is
somewhere down between bytecode generation and what and how llvm
(in its particular version) generates from that).

> Is the memory the JIT takes limited by 'work_mem' or will it just take
> as much memory as it needs?

The latter.

Regards,
Christoph

-- 
Spare Space

Re: load fom csv

Still when I try to run from my powershell script it hangs...

On Tue, Sep 17, 2024 at 8:31 AM Andy Hartman 
wrote:

> I have bad data in an Int field...
>
>  Thanks for your help.
>
> On Tue, Sep 17, 2024 at 1:55 AM Muhammad Usman Khan 
> wrote:
>
>> Hi,
>> Try the following options:
>>
>>
>>- Check if psql is working independently:
>>psql -h $pgServer -d $pgDatabase -U $pgUser -c "SELECT 1;"
>>- Check for permission issues on the CSV file
>>- Run the command manually without variables
>>psql -h your_host -d your_db -U your_user -c "\COPY your_table FROM
>>'C:/path/to/your/file.csv' DELIMITER ',' CSV HEADER;"
>>- set a timeout using the PGCONNECT_TIMEOUT environment variable:
>>$env:PGCONNECT_TIMEOUT=30
>>
>>
>>
>> On Mon, 16 Sept 2024 at 20:35, Andy Hartman 
>> wrote:
>>
>>> I'm trying to run this piece of code from Powershell and it just sits
>>> there and never comes back. There are only 131 records in the csv.
>>>
>>> $connectionString =
>>> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
>>> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>>>
>>> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>>>
>>>
>>> how can I debug this?
>>>
>>> Table layout
>>>
>>> [image: image.png]
>>>
>>

Re: load fom csv


On 9/17/24 09:21, Andy Hartman wrote:

The command work outside of powershell  yes


Then you are going to need to use whatever debugging tools PowerShell 
has available to step through the script to figure out where the problem is.



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

The command work outside of powershell  yes

On Tue, Sep 17, 2024 at 11:39 AM Adrian Klaver 
wrote:

> On 9/17/24 08:35, Andy Hartman wrote:
> > I don't see anything in LOG
>
> 1) Are you logging connections/disconnection per?:
>
>
> https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
>
> If not do so as it will show you if a connection is being attempted.
>
> 2) Do the commands work when run in psql or supplied directly to psql
> outside of PowerShell?
>
>
> >
> > On Tue, Sep 17, 2024 at 11:23 AM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 9/17/24 08:13, Andy Hartman wrote:
> >  > Still when I try to run from my powershell script it hangs...
> >  >
> >
> > And the Postgres log shows?
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: IO related waits


On 9/16/24 20:55, veem v wrote:



On Tue, 17 Sept 2024 at 03:41, Adrian Klaver > wrote:



Are you referring to this?:


https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/
 


If not then you will need to be more specific.


Yes, I was referring to this one. So what can be the caveats in this 
approach, considering transactions meant to be ACID compliant as 
financial transactions.Additionally I was not aware of the parameter 
"synchronous_commit" in DB side which will mimic the synchronous commit.


Would both of these mimic the same asynchronous behaviour and achieves 
the same, which means the client data load throughput will increase 
because the DB will not wait for those data to be written to the WAL and 
give a confirmation back to the client and also the client will not wait 
for the DB to give a confirmation back on the data to be persisted in 
the DB or not?. Also, as in the backend the flushing of the WAL to the 
disk has to happen anyway(just that it will be delayed now), so can this 
method cause contention in the database storage side if the speed in 
which the data gets ingested from the client is not getting written to 
the disk , and if it can someway impact the data consistency for the 
read queries?


This is not something that I am that familiar with. I suspect though 
this is more complicated then you think. From the link above:


" Prerequisites #

As illustrated in the section above, implementing proper asynchronous 
I/O to a database (or key/value store) requires a client to that 
database that supports asynchronous requests. Many popular databases 
offer such a client.


In the absence of such a client, one can try and turn a synchronous 
client into a limited concurrent client by creating multiple clients and 
handling the synchronous calls with a thread pool. However, this 
approach is usually less efficient than a proper asynchronous client.

"

Which means you need to on Flink end:

1) Use Flink async I/O .

2) Find a client that supports async or fake it by using multiple 
synchronous clients.


On Postgres end there is this:

https://www.postgresql.org/docs/current/wal-async-commit.html

That will return a success signal to the client quicker if 
synchronous_commit is set to off. Though the point of the Flink async 
I/O is not to wait for the response before moving on, so I am not sure 
how much synchronous_commit = off would help.


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: IO related waits

2024-09-17 Thread veem v

On Tue, 17 Sept 2024 at 18:43, Greg Sabino Mullane 
wrote:

>
> This is a better place to optimize. Batch many rows per transaction.
> Remove unused indexes.
>
> flushing of the WAL to the disk has to happen anyway(just that it will be
>> delayed now), so can this method cause contention in the database storage
>> side if the speed in which the data gets ingested from the client is not
>> getting written to the disk , and if it can someway impact the data
>> consistency for the read queries?
>>
>
> Not quite clear what you are asking here re data consistency. The data
> will always be consistent, even if synchronous_commit is disabled. The only
> danger window is on a server crash.
>
> (Keep in mind that RDS is not Postgres, so take tuning recommendations and
> advice with a grain of salt.)
>
>
>
Thank you Greg.

Yes, our Java application was doing row by row commit and we saw that from
pg_stat_database from the column "xact_commit" which was closely the same
as the sum of tup_inserted, tup_updated, tup_deleted column. And also we
verified in pg_stats_statements the number against the "calls" column were
matching to the "rows" column for the INSERT queries, so it means also we
are inserting exactly same number of rows as the number of DB calls, so it
also suggest that we are doing row by row operations/dmls.

 And we then asked the application tema to make the inserts in batches, but
still seeing those figures in these above two views are not changing much
the number "xact_commit" is staying almost same and also even the "calls"
and the "rows" column in pg_stats_statements also staying almost same. So
does it mean that the application code is somehow still doing the same row
by row processing or we are doing something wrong in the above analysis?

And another thing we noticed , even after the data load finished , even
then the "xact_commit" was keep increasing along with "tup_fetched", so
does it mean that its doing some kind of implicit commit even for the fetch
type queries which must be "select" queries i believe? Also not sure if its
expected, but here in postgres i have seen unless we put a code within
begin and end block , it's default gets committed even we just run it on
the console , it doesn't ask for a explicit commit/rollback ,so not sure if
that is someway playing a role here.

Regards
Veem

>
>

Re: load fom csv

On 9/17/24 08:35, Andy Hartman wrote:

I don't see anything in LOG

1) Are you logging connections/disconnection per?:

https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

If not do so as it will show you if a connection is being attempted.

2) Do the commands work when run in psql or supplied directly to psql 
outside of PowerShell?

On Tue, Sep 17, 2024 at 11:23 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:

On 9/17/24 08:13, Andy Hartman wrote:
 > Still when I try to run from my powershell script it hangs...
 >

And the Postgres log shows?

-- 
Adrian Klaver

adrian.kla...@aklaver.com 

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

I don't see anything in LOG

On Tue, Sep 17, 2024 at 11:23 AM Adrian Klaver 
wrote:

> On 9/17/24 08:13, Andy Hartman wrote:
> > Still when I try to run from my powershell script it hangs...
> >
>
> And the Postgres log shows?
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: load fom csv


On 9/17/24 08:13, Andy Hartman wrote:

Still when I try to run from my powershell script it hangs...



And the Postgres log shows?

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: question on plain pg_dump file usage


On 9/17/24 05:43, Ron Johnson wrote:
On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) 
mailto:markus.zwett...@zuerich.ch>> wrote:





Why must it be a plain text dump instead of a custom or directory dump?  
Restoring to a new (and differently-named( database is perfectly doable.


Because of this:

sed -i "s/old_name/new_name/g"



--
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: question on plain pg_dump file usage

2024-09-17 Thread Tom Lane

"Zwettler Markus (OIZ)"  writes:
> I have to do an out-of-place Postgres migration from PG12 to PG16 using:
> pg_dump -F p -f dump.sql ...
> sed -i "s/old_name/new_name/g"
> psql -f dump.sql ...

> Both databases are on UTF-8.

> I wonder if there could be character set conversion errors here, as the data 
> is temporarily written to a plain text file.

The dump script will include a "SET client_encoding" command to
prevent that.

regards, tom lane

Re: question on plain pg_dump file usage

2024-09-17 Thread Greg Sabino Mullane

On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) <
markus.zwett...@zuerich.ch> wrote:

> pg_dump -F p -f dump.sql …
>
> sed -i "s/old_name/new_name/g"
>
> psql -f dump.sql …
>

Why not rename afterwards? Just "pg_dump mydb | psql -h newhost -f -" and
rename things via ALTER. Certainly much safer than a global replace via sed.

Cheers,
Greg

WG: [Extern] Re: question on plain pg_dump file usage

2024-09-17 Thread Zwettler Markus (OIZ)


Von: Ron Johnson 
Gesendet: Dienstag, 17. September 2024 14:44
An: PG-General Mailing List 
Betreff: [Extern] Re: question on plain pg_dump file usage

On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) 
mailto:markus.zwett...@zuerich.ch>> wrote:
I have to do an out-of-place Postgres migration from PG12 to PG16 using:

pg_dump -F p -f dump.sql …
sed -i "s/old_name/new_name/g"
psql -f dump.sql …

Both databases are on UTF-8.

I wonder if there could be character set conversion errors here, as the data is 
temporarily written to a plain text file.

Why must it be a plain text dump instead of a custom or directory dump?  
Restoring to a new (and differently-named( database is perfectly doable.

--
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

--- Externe Email: Vorsicht mit Anhängen, Links oder dem Preisgeben von 
Informationen ---

Because I can simply change the application from the old to the new structure 
then.

-Regards, Markus

Re: IO related waits

2024-09-17 Thread Greg Sabino Mullane

On Mon, Sep 16, 2024 at 11:56 PM veem v  wrote:

> So what can be the caveats in this approach, considering transactions
> meant to be ACID compliant as financial transactions.
>

Financial transactions need to be handled with care. Only you know your
business requirements, but as Christophe pointed out, disabling
synchronous commit means your application may think a particular
transaction has completed when it has not. Usually that's a big red flag
for financial applications.

we are using row by row transaction processing for inserting data into the
> postgres database and commit is performed for each row.

This is a better place to optimize. Batch many rows per transaction. Remove
unused indexes.

flushing of the WAL to the disk has to happen anyway(just that it will be
> delayed now), so can this method cause contention in the database storage
> side if the speed in which the data gets ingested from the client is not
> getting written to the disk , and if it can someway impact the data
> consistency for the read queries?
>

Not quite clear what you are asking here re data consistency. The data will
always be consistent, even if synchronous_commit is disabled. The only
danger window is on a server crash.

(Keep in mind that RDS is not Postgres, so take tuning recommendations and
advice with a grain of salt.)

Cheers,
Greg

Re: question on plain pg_dump file usage

2024-09-17 Thread Ron Johnson

On Tue, Sep 17, 2024 at 8:22 AM Zwettler Markus (OIZ) <
markus.zwett...@zuerich.ch> wrote:

> I have to do an out-of-place Postgres migration from PG12 to PG16 using:
>
>
>
> pg_dump -F p -f dump.sql …
>
> sed -i "s/old_name/new_name/g"
>
> psql -f dump.sql …
>
>
>
> Both databases are on UTF-8.
>
>
>
> I wonder if there could be character set conversion errors here, as the
> data is temporarily written to a plain text file.
>
>
Why must it be a plain text dump instead of a custom or directory dump?
Restoring to a new (and differently-named( database is perfectly doable.

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: load fom csv

I have bad data in an Int field...

 Thanks for your help.

On Tue, Sep 17, 2024 at 1:55 AM Muhammad Usman Khan 
wrote:

> Hi,
> Try the following options:
>
>
>- Check if psql is working independently:
>psql -h $pgServer -d $pgDatabase -U $pgUser -c "SELECT 1;"
>- Check for permission issues on the CSV file
>- Run the command manually without variables
>psql -h your_host -d your_db -U your_user -c "\COPY your_table FROM
>'C:/path/to/your/file.csv' DELIMITER ',' CSV HEADER;"
>- set a timeout using the PGCONNECT_TIMEOUT environment variable:
>$env:PGCONNECT_TIMEOUT=30
>
>
>
> On Mon, 16 Sept 2024 at 20:35, Andy Hartman 
> wrote:
>
>> I'm trying to run this piece of code from Powershell and it just sits
>> there and never comes back. There are only 131 records in the csv.
>>
>> $connectionString =
>> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
>> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>>
>> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>>
>>
>> how can I debug this?
>>
>> Table layout
>>
>> [image: image.png]
>>
>

Re: update faster way

2024-09-17 Thread Alvaro Herrera

On 2024-Sep-14, yudhi s wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows.

Another option is to not update anything, and instead create a view on
top of the partitioned table (joined to the reference table) that
returns the reference value instead of the original number value from
the column; when the application wants to receive those reference
values, it queries the view instead of the partitioned table directly.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)

Re: Failing to allocate memory when I think it shouldn't

2024-09-17 Thread Thomas Ziegler

Hello Christoph,

Thanks for your answer and the suggestions, it already helped me out a lot!

On 2024-09-14 22:11, Christoph Moench-Tegeder wrote:

Hi,

## Thomas Ziegler (thomas.zieg...@holmsecurity.com):

There's a lot of information missing here. Let's start from the top.

I have had my database killed by the kernel oom-killer. After that I
set turned off memory over-committing and that is where things got weird.

What exactly did you set? When playing with vm.overcommit, did you
understand "Committed Address Space" and the workings of the
overcommit accounting? This is the document:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/mm/overcommit-accounting.rst
Hint: when setting overcommit_memory=2 you might end up with way
less available adress space than you thought you would. Also keep
an eye on /proc/meminfo - it's sometimes hard to estimate "just off
your cuff" what's in memory and how it's mapped. (Also, anything
else on that machine which might hog memory?).

I set overcommit_memory=2, but completely missed 'overcommit_ratio'.
That is most probably why the database got denied the RAM a lot sooner
than I expected.

Finally, there's this:

2024-09-12 05:18:36.073 UTC [1932776] LOG: background worker "parallel worker"
(PID 3808076) exited with exit code 1
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
2024-09-12 05:18:36.083 UTC [1932776] LOG: background worker "parallel worker"
(PID 3808077) was terminated by signal 6: Aborted

That "std::bad_alloc" sounds a lot like C++ and not like the C our
database is written in. My first suspicion would be that you're using
LLVM-JIT (unless you have other - maybe even your own - C++ extensions
in the database?) and that in itself can use a good chunk of memory.
And it looks like that exception bubbled up as a signal 6 (SIGABRT)
which made the process terminate immediately without any cleanup,
and after that the server has no other chance than to crash-restart.

Except for pgAudit, I don't have any extensions, so it is probably the
JIT. I had no idea there was a JIT, even it should have been obvious.
Thanks for pointing this out!

Is the memory the JIT takes limited by 'work_mem' or will it just take
as much memory as it needs?

I recommend starting with understanding the actual memory limits
as set by your configuration (personally I believe that memory
overcommit is less evil than some people think). Have a close look
at /proc/meminfo and if possible disable JIT and check if it changes
anything. Also if possible try starting with only a few active
connections and increase load carefully once a steady state (in
terms of memory usage) has been reached.

Yes, understanding the memory limits is what I was trying to do.
I was questioning my understanding but it seems it was Linux that
tripped me,

or more like my lack of understanding there, rather than the database.
Memory management and /proc/meminfo still manages to confuse me.

Again, thanks for your help!

Cheers,
Thomas

p.s.: To anybody who stumbles upon this in the future,
if you set `overcommit_memory=2`, don't forget `overcommit_ratio`.

Re: load fom csv

2024-09-16 Thread Muhammad Usman Khan

Hi,
Try the following options:


   - Check if psql is working independently:
   psql -h $pgServer -d $pgDatabase -U $pgUser -c "SELECT 1;"
   - Check for permission issues on the CSV file
   - Run the command manually without variables
   psql -h your_host -d your_db -U your_user -c "\COPY your_table FROM
   'C:/path/to/your/file.csv' DELIMITER ',' CSV HEADER;"
   - set a timeout using the PGCONNECT_TIMEOUT environment variable:
   $env:PGCONNECT_TIMEOUT=30



On Mon, 16 Sept 2024 at 20:35, Andy Hartman  wrote:

> I'm trying to run this piece of code from Powershell and it just sits
> there and never comes back. There are only 131 records in the csv.
>
> $connectionString =
> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>
> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>
>
> how can I debug this?
>
> Table layout
>
> [image: image.png]
>

Re: update faster way

2024-09-16 Thread Muhammad Usman Khan

Hi,

You can solve this problem using Citus in PostgreSQL, which is specifically
designed for parallelism

SELECT create_distributed_table('tab_part1', 'partition_key');
SELECT create_distributed_table('reference_tab', 'reference_key');

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1;


On Sat, 14 Sept 2024 at 08:22, yudhi s  wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows. As
> we tested for ~30million rows it's taking ~20minutes to update. So if we go
> by this calculation, it's going to take days for updating all the values.
> So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel
> (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g.
> five partitions will have the updates done at same time from 5 different
> sessions)? And will it have any locking effect or we can just start the
> sessions and let them run without impacting our live transactions?
>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi
>

Re: IO related waits

2024-09-16 Thread veem v

On Tue, 17 Sept 2024 at 03:41, Adrian Klaver 
wrote:

>
> Are you referring to this?:
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/
>
> If not then you will need to be more specific.
>
>
Yes, I was referring to this one. So what can be the caveats in this
approach, considering transactions meant to be ACID compliant as financial
transactions.Additionally I was not aware of the parameter
"synchronous_commit" in DB side which will mimic the synchronous commit.

Would both of these mimic the same asynchronous behaviour and achieves the
same, which means the client data load throughput will increase because the
DB will not wait for those data to be written to the WAL and give a
confirmation back to the client and also the client will not wait for the
DB to give a confirmation back on the data to be persisted in the DB or
not?. Also, as in the backend the flushing of the WAL to the disk has to
happen anyway(just that it will be delayed now), so can this method cause
contention in the database storage side if the speed in which the data gets
ingested from the client is not getting written to the disk , and if it can
someway impact the data consistency for the read queries?

Re: IO related waits


On 9/16/24 13:24, veem v wrote:

Hi,
One of our application using RDS postgres. In one of our streaming 
applications(using flink) which processes 100's of millions of 
transactions each day, we are using row by row transaction processing 
for inserting data into the postgres database and commit is performed 
for each row. We are seeing heavy IO:XactSynch wait events during the 
data load and also high overall response time.


Architecture team is suggesting to enable asynch io if possible, so that 
the streaming client will not wait for the commit confirmation from the 
database. So I want to understand , how asynch io can be enabled and if 
any downsides of doing this?


Are you referring to this?:

https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/

If not then you will need to be more specific.



Regards
Veem


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: IO related waits

2024-09-16 Thread Christophe Pettus

> On Sep 16, 2024, at 13:24, veem v  wrote:
> Architecture team is suggesting to enable asynch io if possible, so that the 
> streaming client will not wait for the commit confirmation from the database. 
> So I want to understand , how asynch io can be enabled and if any downsides 
> of doing this? 

"Async I/O" has a specific meaning that's not currently applicable to 
PostgreSQL.  What is available is "synchronous_commit".  This setting is by 
default on.  When it's on, each commit waits until the associated WAL 
information has been flushed to disk, and then returns.  If it is turned off, 
the commit returns more or less immediately, and the WAL flush happens 
asynchronously from the commit.

The upside is that the session can proceed without waiting for the WAL flush.  
The downside is that on a server crash, some transactions may not have been 
fully committed to the database, and will be missing when the database 
restarts.  The database won't be corrupted (as in, you try to use it and get 
errors), but it will be "earlier in time" than the application might expect.  
It's pretty common to turn it off for high-ingestion-rate situations, 
especially where the application can detect and replay missing transactions on 
a crash.

Re: load fom csv


On 9/16/24 10:00, Andy Hartman wrote:

in LOG

2024-09-16 12:55:37.295 EDT [428] ERROR:  invalid byte sequence for 
encoding "UTF8": 0x00
2024-09-16 12:55:37.295 EDT [428] CONTEXT:  COPY 
image_classification_master, line 1, column spoolstarttime




I'm heading out the door I can't step you through the process, I can 
point you at:


https://www.postgresql.org/docs/current/multibyte.html#MULTIBYTE-AUTOMATIC-CONVERSION

Others will be able to answer the specifics.


--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-16 Thread Andy Hartman

in LOG

2024-09-16 12:55:37.295 EDT [428] ERROR:  invalid byte sequence for
encoding "UTF8": 0x00
2024-09-16 12:55:37.295 EDT [428] CONTEXT:  COPY
image_classification_master, line 1, column spoolstarttime

On Mon, Sep 16, 2024 at 12:56 PM Adrian Klaver 
wrote:

> On 9/16/24 09:46, Andy Hartman wrote:
> >
> > It Looks correct.
> >
> > $pgTable = "image_classification_master"
>
> Connect to the database with psql and look at the table name. I'm
> betting it is not image_classification_master. Instead some mixed or all
> upper case version of the name.
>
> I don't use PowerShell or Windows for that matter these days so I can't
> be of much use on the script. I do suspect you will need to some
> escaping to get the table name properly quoted in the script. To work
> through this you need to try what I call the crawl/walk/run process. In
> this case that is:
>
> 1) Crawl. Connect using psql and run the \copy in it with hard coded
> values.
>
> 2) Walk. Use psql with the -c argument and supply the command again with
> hard coded values
>
> 3) Run. Then use PowerShell and do the variable substitution.
>
> >
> >
> >
> >
> > On Mon, Sep 16, 2024 at 12:17 PM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 9/16/24 09:12, Andy Hartman wrote:
> >  > 2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
> >  > "image_classification_master" does not exist
> >  > 2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
> >  > Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;
> >
> > I'm assuming this is from the Postgres log.
> >
> > Best guess is the table name in the database is mixed case and needs
> to
> > be double quoted in the command to preserve the casing.
> >
> > See:
> >
> >
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
> <
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
> >
> >
> > for why.
> >
> >  >
> >  > On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte
> >  > mailto:fola...@peoplecall.com>
> > >>
> wrote:
> >  >
> >  >
> >  >
> >  > On Mon, 16 Sept 2024 at 17:36, Andy Hartman
> > mailto:hartman60h...@gmail.com>
> >  >  > >> wrote:
> >  >
> >  > I'm trying to run this piece of code from Powershell and
> > it just
> >  > sits there and never comes back. There are only 131
> > records in
> >  > the csv.
> >  > $connectionString =
> >  >
> >
>  "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> >  > $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER
> > ',' CSV
> >  > HEADER;"
> >  > psql -h $pgServer -d $pgDatabase -U $pgUser -c
> $copyCommand
> >  > how can I debug this?
> >  >
> >  >
> >  > I would start by adding -a and -e after "psql".
> >  >
> >  > IIRC Powershell is windows, and in windows shell do not pass
> > command
> >  > words preparsed as in *ix to the executable, but a single
> command
> >  > line with the executable must parse. Given the amount of
> > quoting, -a
> >  > and -e will let you see the commands are properly sent, and
> > if it is
> >  > trying to read something what it is.
> >  >
> >  > I will also try to substitute the -c with a pipe. If it
> > heals, it is
> >  > probably a quoting issue.
> >  >
> >  > Also, I just caught Ron's message, and psql might be waiting
> > for a
> >  > password.
> >  >
> >  > Francisco Olarte.
> >  >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: load fom csv

On 9/16/24 09:46, Andy Hartman wrote:

It Looks correct.

$pgTable = "image_classification_master"

Connect to the database with psql and look at the table name. I'm 
betting it is not image_classification_master. Instead some mixed or all 
upper case version of the name.

I don't use PowerShell or Windows for that matter these days so I can't 
be of much use on the script. I do suspect you will need to some 
escaping to get the table name properly quoted in the script. To work 
through this you need to try what I call the crawl/walk/run process. In 
this case that is:

1) Crawl. Connect using psql and run the \copy in it with hard coded values.

2) Walk. Use psql with the -c argument and supply the command again with 
hard coded values

3) Run. Then use PowerShell and do the variable substitution.

On Mon, Sep 16, 2024 at 12:17 PM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:

On 9/16/24 09:12, Andy Hartman wrote:
 > 2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
 > "image_classification_master" does not exist
 > 2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
 > Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;

I'm assuming this is from the Postgres log.

Best guess is the table name in the database is mixed case and needs to
be double quoted in the command to preserve the casing.

See:

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

for why.

 >
 > On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte
 > mailto:fola...@peoplecall.com>
>> wrote:
 >
 >
 >
 >     On Mon, 16 Sept 2024 at 17:36, Andy Hartman
mailto:hartman60h...@gmail.com>
 >     >> wrote:
 >
 >         I'm trying to run this piece of code from Powershell and
it just
 >         sits there and never comes back. There are only 131
records in
 >         the csv.
 >         $connectionString =
 >   
  "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"

 >         $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER
',' CSV
 >         HEADER;"
 >         psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
 >         how can I debug this?
 >
 >
 >     I would start by adding -a and -e after "psql".
 >
 >     IIRC Powershell is windows, and in windows shell do not pass
command
 >     words preparsed as in *ix to the executable, but a single command
 >     line with the executable must parse. Given the amount of
quoting, -a
 >     and -e will let you see the commands are properly sent, and
if it is
 >     trying to read something what it is.
 >
 >     I will also try to substitute the -c with a pipe. If it
heals, it is
 >     probably a quoting issue.
 >
 >     Also, I just caught Ron's message, and psql might be waiting
for a
 >     password.
 >
 >     Francisco Olarte.
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 

--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-16 Thread Andy Hartman

It Looks correct.

$pgTable = "image_classification_master"




On Mon, Sep 16, 2024 at 12:17 PM Adrian Klaver 
wrote:

> On 9/16/24 09:12, Andy Hartman wrote:
> > 2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
> > "image_classification_master" does not exist
> > 2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
> > Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;
>
> I'm assuming this is from the Postgres log.
>
> Best guess is the table name in the database is mixed case and needs to
> be double quoted in the command to preserve the casing.
>
> See:
>
>
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS
>
> for why.
>
> >
> > On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte
> > mailto:fola...@peoplecall.com>> wrote:
> >
> >
> >
> > On Mon, 16 Sept 2024 at 17:36, Andy Hartman  > > wrote:
> >
> > I'm trying to run this piece of code from Powershell and it just
> > sits there and never comes back. There are only 131 records in
> > the csv.
> > $connectionString =
> >
>  "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> > $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV
> > HEADER;"
> > psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
> > how can I debug this?
> >
> >
> > I would start by adding -a and -e after "psql".
> >
> > IIRC Powershell is windows, and in windows shell do not pass command
> > words preparsed as in *ix to the executable, but a single command
> > line with the executable must parse. Given the amount of quoting, -a
> > and -e will let you see the commands are properly sent, and if it is
> > trying to read something what it is.
> >
> > I will also try to substitute the -c with a pipe. If it heals, it is
> > probably a quoting issue.
> >
> > Also, I just caught Ron's message, and psql might be waiting for a
> > password.
> >
> > Francisco Olarte.
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Re: load fom csv


On 9/16/24 09:12, Andy Hartman wrote:
2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation 
"image_classification_master" does not exist
2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY  
Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;


I'm assuming this is from the Postgres log.

Best guess is the table name in the database is mixed case and needs to 
be double quoted in the command to preserve the casing.


See:

https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

for why.



On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte 
mailto:fola...@peoplecall.com>> wrote:




On Mon, 16 Sept 2024 at 17:36, Andy Hartman mailto:hartman60h...@gmail.com>> wrote:

I'm trying to run this piece of code from Powershell and it just
sits there and never comes back. There are only 131 records in
the csv.
$connectionString =

"Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
$copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV
HEADER;"
psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
how can I debug this? 



I would start by adding -a and -e after "psql".

IIRC Powershell is windows, and in windows shell do not pass command
words preparsed as in *ix to the executable, but a single command
line with the executable must parse. Given the amount of quoting, -a
and -e will let you see the commands are properly sent, and if it is
trying to read something what it is.

I will also try to substitute the -c with a pipe. If it heals, it is
probably a quoting issue.

Also, I just caught Ron's message, and psql might be waiting for a
password.

Francisco Olarte.



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: load fom csv

2024-09-16 Thread Andy Hartman

2024-09-16 12:06:00.968 EDT [4968] ERROR:  relation
"image_classification_master" does not exist
2024-09-16 12:06:00.968 EDT [4968] STATEMENT:  COPY
Image_Classification_Master FROM STDIN DELIMITER ',' CSV HEADER;

On Mon, Sep 16, 2024 at 11:52 AM Francisco Olarte 
wrote:

>
>
> On Mon, 16 Sept 2024 at 17:36, Andy Hartman 
> wrote:
>
>> I'm trying to run this piece of code from Powershell and it just sits
>> there and never comes back. There are only 131 records in the csv.
>> $connectionString =
>> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
>> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>> how can I debug this?
>>
>
> I would start by adding -a and -e after "psql".
>
> IIRC Powershell is windows, and in windows shell do not pass command words
> preparsed as in *ix to the executable, but a single command line with the
> executable must parse. Given the amount of quoting, -a and -e will let you
> see the commands are properly sent, and if it is trying to read something
> what it is.
>
> I will also try to substitute the -c with a pipe. If it heals, it is
> probably a quoting issue.
>
> Also, I just caught Ron's message, and psql might be waiting for a
> password.
>
> Francisco Olarte.
>
>

Re: load fom csv

2024-09-16 Thread Francisco Olarte

On Mon, 16 Sept 2024 at 17:36, Andy Hartman  wrote:

> I'm trying to run this piece of code from Powershell and it just sits
> there and never comes back. There are only 131 records in the csv.
> $connectionString =
> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
> how can I debug this?
>

I would start by adding -a and -e after "psql".

IIRC Powershell is windows, and in windows shell do not pass command words
preparsed as in *ix to the executable, but a single command line with the
executable must parse. Given the amount of quoting, -a and -e will let you
see the commands are properly sent, and if it is trying to read something
what it is.

I will also try to substitute the -c with a pipe. If it heals, it is
probably a quoting issue.

Also, I just caught Ron's message, and psql might be waiting for a password.

Francisco Olarte.

Re: load fom csv

2024-09-16 Thread Ron Johnson

On Mon, Sep 16, 2024 at 11:36 AM Andy Hartman 
wrote:

> I'm trying to run this piece of code from Powershell and it just sits
> there and never comes back. There are only 131 records in the csv.
>
> $connectionString =
> "Host=$pgServer;Database=$pgDatabase;Username=$pgUser;Password=$pgPassword"
> $copyCommand = "\COPY $pgTable FROM '$csvPath' DELIMITER ',' CSV HEADER;"
>
> psql -h $pgServer -d $pgDatabase -U $pgUser -c $copyCommand
>
> how can I debug this?
>

Maybe I'm missing something obvious, but where in the psql command are you
using $connectionString?

-- 
Death to , and butter sauce.
Don't boil me, I'm still alive.
 crustacean!

Re: load fom csv