Re: Sequence generating negative numbers

2020-08-19 Thread Adrian Klaver

On 8/19/20 3:24 PM, Shantanu Shekhar wrote:
Please reply to list also.
Ccing list.


Thanks Adrian,

I will reach out to the ORM team and see if they can help me understand 
this behavior.


I should have asked earlier, is this sequence set as a DEFAULT on the PK 
field or is it just being used by code to get numbers?




Shantanu
On Wednesday, August 19, 2020, 06:20:27 PM EDT, Adrian Klaver 
 wrote:



On 8/19/20 3:15 PM, Shantanu Shekhar wrote:
 > Team,
 >
 > I have a sequence definition in Postgres 9.6.11 like so:
 >
 > CREATE SEQUENCE IF NOT EXISTS org.my_seq
 >    INCREMENT 1
 >    MINVALUE 1
 >    NO MAXVALUE
 >    START 1
 >    CACHE 20;
 >
 > This sequence is used by a Java ORM framework to generate primary keys
 > for one of our tables. The initial numbers generated by this sequence
 > are as shown below:
 >
 > -28 -27 -26 -25 -8 -7 1 2 52 53 72 92 93 94 112 113 132 133 152 172 192
 > 193 212
 >
 > I am unable to understand why the sequence would start with a negative
 > number, particularly when the definition explicitly asks the sequence to
 > start at 1. This has happened consistently in all of our environments.

Because something in the ORM is generating negative numbers and
supplying then directly to the PK field.


 >
 > Thanks,
 >
 > Shantanu



--
Adrian Klaver
adrian.kla...@aklaver.com 






--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Understanding EXPLAIN ANALYZE estimates when loops != 1

2020-08-19 Thread David Rowley
On Thu, 20 Aug 2020 at 09:55, Philip Semanchuk
 wrote:
> I could use some help interpreting EXPLAIN ANALYZE output.
>
> ->  Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) 
> (actual time=0.006..0.918 rows=3760 loops=94)
>
> The actual rows returned by this plan node ~= 3760 * 94 = 353,440.

Yes.  It's total rows / loops rounded to the nearest integer number.

> Did postgres expect (estimate) 3283 rows from this join, or 3283 * 94 = 
> 308,602?

Yes, that's the case at least when the node is not a Parallel node.
If this index scan was part of a parameterized nested loop, then
you'll see the estimate of the number of expected loops from the outer
side of the join.

> Same question for this node.
>
> ->  Parallel Index Scan using pk_xyz on xyz  (cost=0.29..2354.67 rows=54285 
> width=25) (actual time=0.049..6.326 rows=14864 loops=5)
>
> Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5?

So parallel plans are a bit more complex.   The row estimates are the
total estimated rows  / the amount of workers we expect to do useful
work.  You might expect the divisor there to be an integer number
since you can't really have 0.5 workers.  However, it's more complex
than that since the leader has other tasks to take care of such as
pulling tuples from workers, it's not dedicated to helping out.

If you're into reading C code, then there's more information in
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L5699
 , if you hunt around for usages of that function then you'll see the
estimated row counts are divided by the return value of that function.

David




Re: Sequence generating negative numbers

2020-08-19 Thread Adrian Klaver

On 8/19/20 3:15 PM, Shantanu Shekhar wrote:

Team,

I have a sequence definition in Postgres 9.6.11 like so:

CREATE SEQUENCE IF NOT EXISTS org.my_seq
   INCREMENT 1
   MINVALUE 1
   NO MAXVALUE
   START 1
   CACHE 20;

This sequence is used by a Java ORM framework to generate primary keys 
for one of our tables. The initial numbers generated by this sequence 
are as shown below:


-28 -27 -26 -25 -8 -7 1 2 52 53 72 92 93 94 112 113 132 133 152 172 192 
193 212


I am unable to understand why the sequence would start with a negative 
number, particularly when the definition explicitly asks the sequence to 
start at 1. This has happened consistently in all of our environments.


Because something in the ORM is generating negative numbers and 
supplying then directly to the PK field.




Thanks,

Shantanu



--
Adrian Klaver
adrian.kla...@aklaver.com




Sequence generating negative numbers

2020-08-19 Thread Shantanu Shekhar
Team,
I have a sequence definition in Postgres 9.6.11 like so:
CREATE SEQUENCE IF NOT EXISTS org.my_seq  INCREMENT 1  MINVALUE 1  NO MAXVALUE  
START 1  CACHE 20;
This sequence is used by a Java ORM framework to generate primary keys for one 
of our tables. The initial numbers generated by this sequence are as shown 
below:
-28 -27 -26 -25 -8 -7 1 2 52 53 72 92 93 94 112 113 132 133 152 172 192 193 212

I am unable to understand why the sequence would start with a negative number, 
particularly when the definition explicitly asks the sequence to start at 1. 
This has happened consistently in all of our environments. 
Thanks,
Shantanu

Understanding EXPLAIN ANALYZE estimates when loops != 1

2020-08-19 Thread Philip Semanchuk
Hi all,
I could use some help interpreting EXPLAIN ANALYZE output. 

->  Index Scan using ix_foo on t1 (cost=0.69..68.57 rows=3283 width=105) 
(actual time=0.006..0.918 rows=3760 loops=94)

The actual rows returned by this plan node ~= 3760 * 94 = 353,440. Did postgres 
expect (estimate) 3283 rows from this join, or 3283 * 94 = 308,602? 


Same question for this node.

->  Parallel Index Scan using pk_xyz on xyz  (cost=0.29..2354.67 rows=54285 
width=25) (actual time=0.049..6.326 rows=14864 loops=5)

Actual rows ~= 14864 * 5 = 74,320, estimate = 54285 or 54285 * 5?


THanks
Philip



Re: Interpolation problem - pg 12.4 - full correct version!

2020-08-19 Thread Pól Ua Laoínecháin
Hi,

> Might want to consider using PL/R with something like the R imputeTS package:
> https://cran.r-project.org/web/packages/imputeTS/readme/README.html

Thanks for your input - looks interesting, but see my reply to David Johnston.

Rgs,

Pól

> Joe




Re: Interpolation problem - pg 12.4 - full correct version!

2020-08-19 Thread Pól Ua Laoínecháin
Hi,

>> So, my question is: Is there a recognised technique (using SQL only,
>> not PL/pgSQL - soutions based on the latter are easy to find) whereby
>> I can do a basic Linear Interpolation?

> I don't have a recognized technique, nor care to ponder one right now, but 
> what you've described would best be done in pure SQL using WITH RECURSIVE, 
> which provides an iterative approach to SQL result building.  Which is more 
> commonly done in a procedural language.  The algorithm you describe is an 
> iterative algorithm and so I'm wondering why place the arbitrary restriction 
> on using pure SQL when it likely wouldn't provide a very readable nor 
> performant solution relative to a procedural (pl/pgsql or otherwise) one?

Well, it would be trivial to do something with PL/pgSQL (just Google
and download the code - there are a few examples around).

I''m a big fan of this site https://explainextended.com/ - this guy
can do anything with SQL - I'd like to try and learn how to do the
same.

You're idea of RECURSIVE is exactly along the lines I was thinking of
- I'll persevere and see what I can come up with using WITH RECURSIVE
- I've use it before.

Thanks for your input and rgs,


Pól...


> David J.




Re: Window functions speed

2020-08-19 Thread Michael Lewis
Where's the query? Are you able to run explain analyze so we can see
estimates vs actual counts? What version are you using?

Can you share explain analyze before and after the window function?

I mean, that limit at the top makes me think it is doing a lot less work
without the window function, vs examining all candidate rows if we need to
count them all. What are you using that count for anyway? Pagination?

>


Re: Interpolation problem - pg 12.4 - full correct version!

2020-08-19 Thread Joe Conway
On 8/19/20 3:08 PM, David G. Johnston wrote:
> On Wed, Aug 19, 2020 at 11:51 AM Pól Ua Laoínecháin  > wrote:
> 
> 
> I think my *MAJOR* problem is that I've developed what is,
> essentially, a totally brute force approach - and this simply won't
> work at the scenario becomes more complex - take a look at the CASE
> statement - it's horrible and would only become exponentially worse as
> the number NULLs rises.
> 
> So, my question is: Is there a recognised technique (using SQL only,
> not PL/pgSQL - soutions based on the latter are easy to find) whereby
> I can do a basic Linear Interpolation?
> 
> I don't have a recognized technique, nor care to ponder one right now, but 
> what
> you've described would best be done in pure SQL using WITH RECURSIVE, which
> provides an iterative approach to SQL result building.  Which is more commonly
> done in a procedural language.  The algorithm you describe is an iterative
> algorithm and so I'm wondering why place the arbitrary restriction on using 
> pure
> SQL when it likely wouldn't provide a very readable nor performant solution
> relative to a procedural (pl/pgsql or otherwise) one?


Might want to consider using PL/R with something like the R imputeTS package:

https://cran.r-project.org/web/packages/imputeTS/readme/README.html

HTH,

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Window functions speed

2020-08-19 Thread Zahir Lalani
Confidential

Hello

I have spent the last day optimising a critical query which suddenly started 
behaving very inefficiently. There were issues with the query which are now 
sorted. The base query is now working in a timeframe that is far better. 
However, as soon as I add a window count function to the column list, the 
performance drops significantly.

Happy to provide info as needed, not sure what is best to provide. Below is the 
explain for the query with the window function


Limit  (cost=49.44..4580.73 rows=24 width=567)
  CTE searched_jobs
->  Unique  (cost=0.04..0.05 rows=2 width=4)
  ->  Sort  (cost=0.04..0.05 rows=2 width=4)
Sort Key: id
->  Append  (cost=0.00..0.03 rows=2 width=4)
  ->  Result  (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
  ->  Result  (cost=0.00..0.01 rows=1 width=4)
  CTE workflow_and_parentmedia_local_id
->  Function Scan on c_type_by_key_get  (cost=0.25..10.25 rows=1000 width=8)
  ->  WindowAgg  (cost=39.14..4570.43 rows=24 width=567)
->  Hash Left Join  (cost=39.14..4569.77 rows=24 width=438)
  Hash Cond: (j.fk_production_status = wf.workflow_local_id)
  Join Filter: ((wf.media_local_id = j.fk_media_type_main) OR 
(wf.media_local_id = j.fk_media_type_sub))
  ->  Nested Loop Left Join  (cost=6.64..4532.53 rows=24 width=450)
Filter: ((j.fk_owning_agency_org = ANY 
('{26803,34179,24711,24710,32369,21404,20220,13117,4145,4144,20238,32696,32372,21277,9,21261,7909,15068,14618,34469,34425,34471,34470,11121,1293,20,21384,13281,13438}'::integer[]))
 OR (j.fk_agency_org = ANY 
('{26803,34179,24711,24710,32369,21404,20220,13117,4145,4144,20238,32696,32372,21277,9,21261,7909,15068,14618,34469,34425,34471,34470,11121,1293,20,21384,13281,13438}'::integer[]))
 OR (job_people.fk_child_id = ANY 
('{15264,12190,12189,12191,12192,15262,15263,15265,19317,27243,31746,31773}'::integer[])))
->  Nested Loop Left Join  (cost=6.21..4438.89 rows=58 
width=454)
  ->  Nested Loop  (cost=5.92..4420.75 rows=58 
width=441)
->  Nested Loop Left Join  (cost=5.63..4402.61 
rows=58 width=428)
  ->  Nested Loop Left Join  
(cost=5.35..4385.16 rows=58 width=416)
->  Nested Loop Left Join  
(cost=5.07..4368.19 rows=58 width=406)
  Join Filter: 
(production_colours.local_id = prod_status.colour_id)
  ->  Nested Loop Left Join  
(cost=4.80..4356.82 rows=58 width=397)
Join Filter: 
(prod_status.parent_id = media.id)
->  Nested Loop Left 
Join  (cost=4.52..4325.64 rows=58 width=380)
  ->  Nested Loop 
Left Join  (cost=3.98..4282.28 rows=58 width=376)
Join 
Filter: (colours.local_id = js.colour_id)
->  Nested 
Loop Left Join  (cost=3.71..4270.92 rows=58 width=367)
  ->  
Nested Loop Left Join  (cost=3.42..4252.78 rows=58 width=354)

Join Filter: (sub_mt.parent_id = mt.id)

->  Nested Loop Left Join  (cost=3.15..4224.50 rows=58 width=341)

  Join Filter: (mt.local_id = j.fk_media_type_main)

  ->  Nested Loop  (cost=2.87..4205.54 rows=58 width=320)

->  Nested Loop  (cost=2.58..4187.67 rows=58 width=320)

  ->  Nested Loop  (cost=2.29..4169.81 rows=58 width=320)

->  Nested Loop  (cost=2.00..4151.66 rows=58 width=301)

  ->  Nested Loop  (cost=1.71..4132.36 rows=58 
width=282)

->  Nested Loop  (cost=1.42..4114.21 
rows=58 width=263)

  ->  Nested Loop  

Re: Interpolation problem - pg 12.4 - full correct version!

2020-08-19 Thread David G. Johnston
On Wed, Aug 19, 2020 at 11:51 AM Pól Ua Laoínecháin  wrote:

>
> I think my *MAJOR* problem is that I've developed what is,
> essentially, a totally brute force approach - and this simply won't
> work at the scenario becomes more complex - take a look at the CASE
> statement - it's horrible and would only become exponentially worse as
> the number NULLs rises.
>
> So, my question is: Is there a recognised technique (using SQL only,
> not PL/pgSQL - soutions based on the latter are easy to find) whereby
> I can do a basic Linear Interpolation?
>
>
I don't have a recognized technique, nor care to ponder one right now, but
what you've described would best be done in pure SQL using WITH RECURSIVE,
which provides an iterative approach to SQL result building.  Which is more
commonly done in a procedural language.  The algorithm you describe is an
iterative algorithm and so I'm wondering why place the arbitrary
restriction on using pure SQL when it likely wouldn't provide a very
readable nor performant solution relative to a procedural (pl/pgsql or
otherwise) one?

David J.


Interpolation problem - pg 12.4 - full correct version!

2020-08-19 Thread Pól Ua Laoínecháin
Hi all - here is the entire correct problem - apologies again!

Hi all,

I have an interpolation problem as follows - fiddle available here:

https://dbfiddle.uk/?rdbms=postgres_12=428aa76d49b37961088d3dfef190757f


A table:

CREATE TABLE data
(
  s SERIAL PRIMARY KEY,
  t TIMESTAMP,
  lat NUMERIC
);

and data:


INSERT INTO data (t, lat)
VALUES
('2019-01-01 00:00:00', 5.07),
('2019-01-01 01:00:00', 4.60),
('2019-01-01 02:00:00', NULL),
('2019-01-01 03:00:00', NULL),
('2019-01-01 04:00:00', 4.7),
('2019-01-01 05:00:00', 4.20),
('2019-01-01 06:00:00', NULL),
('2019-01-01 07:00:00', 4.98),
('2019-01-01 08:00:00', 4.50),
('2019-01-01 09:00:00', 4.7),
('2019-01-01 10:00:00', NULL),
('2019-01-01 11:00:00', NULL),
('2019-01-01 12:00:00', NULL),
('2019-01-01 13:00:00', 6.45),
('2019-01-01 14:00:00', 3.50);


There are gaps in the data as you can see - I'm trying to fill them
using the algorithm:

 - a sequence of 1 NULL - take the average of the reading above and
the reading below

- a sequence of 2 NULLs - the top assigned value is the average of the
two records above it and the bottom assigned one is the average of the
two records below.

So far, so good - I'm able to do this (but see discussion below)

 - a sequence of 3 NULLs - the middle one is assigned a value equal to
average of the non-NULL record above and the non-null record below,
and then the remaining NULLs above and below the average of the middle
one and the non-NULL ones above and below.

This is where it gets tricky - I'm getting answers, but I don't think
they're correct. The result of the massive  SQL shown below are here
(also on fiddle):

slat final_val
15.07 5.07
24.60 4.60
3NULL 4.84
4NULL 4.45
5 4.7  4.7
64.20 4.20
7NULL 4.59
84.98 4.98
94.50 4.50
104.7  4.7
11   NULL 4.60
12   NULL 5.58
13   NULL 4.98
14   6.45 6.45
15   3.50 3.50

The value for record 12 is correct, ,but not those above and below it.

I think my *MAJOR* problem is that I've developed what is,
essentially, a totally brute force approach - and this simply won't
work at the scenario becomes more complex - take a look at the CASE
statement - it's horrible and would only become exponentially worse as
the number NULLs rises.

So, my question is: Is there a recognised technique (using SQL only,
not PL/pgSQL - soutions based on the latter are easy to find) whereby
I can do a basic Linear Interpolation?

Should you require any further information, please don't hesitate to contact me.

TIA and rgs,


Pól...


=

My mega SQL:

WITH cte1 AS
(
  SELECT d1.s,
d1.t AS t1, d1.lat AS l1,
LAG(d1.lat, 2)  OVER (ORDER BY t ASC) AS lag_t1_2,
LAG(d1.lat, 1)  OVER (ORDER BY t ASC) AS lag_t1,
LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1,
LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2
  FROM data d1
),
cte2 AS
(
  SELECT
d2.t AS t2, d2.lat AS l2,
LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2,
LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2
  FROM data d2
),
cte3 AS
(
  SELECT t1.s,
t1.t1,  t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2,
t1.lead_t1, t2.lead_t2, t1.lead_t1_2
  FROM cte1 t1
  JOIN cte2 t2
  ON t1.t1 = t2.t2
),
cte4 AS
(
  SELECT t1.s,
  t1.l1 AS lat,
CASE

  -- The WHEN for the middle of 3 NULLs has to be at the beginning
  -- of the CASE - if at the end, it remains NULL - why?

  WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL)
AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL)
  THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2)

  WHEN (t1.l1 IS NOT NULL) THEN t1.l1
  WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL)
AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2)
  WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL)
AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2)
  WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL)  AND (t1.lag_t1 IS NULL)
AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2)
  ELSE 0
END AS final_val
  FROM cte3 t1
)
SELECT s, lat, final_val FROM cte4;




Re: Interpolatioin problem - pg 12.4

2020-08-19 Thread Pól Ua Laoínecháin
HI  all,

Sorry - posted the wrong URL for my problem - Doh...

Here is the correct one!

https://dbfiddle.uk/?rdbms=postgres_12=428aa76d49b37961088d3dfef190757f

Again, apologies and  rgs,


Pól...

On Wed, 19 Aug 2020 at 19:16, Pól Ua Laoínecháin  wrote:
>
> Hi all,
>
> I have an interpolation problem as follows - fiddle available here:
>
> https://dbfiddle.uk/?rdbms=postgres_12=8d23925146ea11a904c454709b0026fd
>
> A table:
>
> CREATE TABLE data
> (
>   s SERIAL PRIMARY KEY,
>   t TIMESTAMP,
>   lat NUMERIC
> );
>
> and data:
>
>
> INSERT INTO data (t, lat)
> VALUES
> ('2019-01-01 00:00:00', 5.07),
> ('2019-01-01 01:00:00', 4.60),
> ('2019-01-01 02:00:00', NULL),
> ('2019-01-01 03:00:00', NULL),
> ('2019-01-01 04:00:00', 4.7),
> ('2019-01-01 05:00:00', 4.20),
> ('2019-01-01 06:00:00', NULL),
> ('2019-01-01 07:00:00', 4.98),
> ('2019-01-01 08:00:00', 4.50),
> ('2019-01-01 09:00:00', 4.7),
> ('2019-01-01 10:00:00', NULL),
> ('2019-01-01 11:00:00', NULL),
> ('2019-01-01 12:00:00', NULL),
> ('2019-01-01 13:00:00', 6.45),
> ('2019-01-01 14:00:00', 3.50);
>
>
> There are gaps in the data as you can see - I'm trying to fill them
> using the algorithm:
>
>  - a sequence of 1 NULL - take the average of the reading above and
> the reading below
>
> - a sequence of 2 NULLs - the top assigned value is the average of the
> two records above it and the bottom assigned one is the average of the
> two records below.
>
> So far, so good - I'm able to do this (but see discussion below)
>
>  - a sequence of 3 NULLs - the middle one is assigned a value equal to
> average of the non-NULL record above and the non-null record below,
> and then the remaining NULLs above and below the average of the middle
> one and the non-NULL ones above and below.
>
> This is where it gets tricky - I'm getting answers, but I don't think
> they're correct. The result of the massive  SQL shown below are here
> (also on fiddle):
>
> slat final_val
> 15.07 5.07
> 24.60 4.60
> 3NULL 4.84
> 4NULL 4.45
> 5 4.7  4.7
> 64.20 4.20
> 7NULL 4.59
> 84.98 4.98
> 94.50 4.50
> 104.7  4.7
> 11   NULL 4.60
> 12   NULL 5.58
> 13   NULL 4.98
> 14   6.45 6.45
> 15   3.50 3.50
>
> The value for record 12 is correct, ,but not those above and below it.
>
> I think my *MAJOR* problem is that I've developed what is,
> essentially, a totally brute force approach - and this simply won't
> work at the scenario becomes more complex - take a look at the CASE
> statement - it's horrible and would only become exponentially worse as
> the number NULLs rises.
>
> So, my question is: Is there a recognised technique (using SQL only,
> not PL/pgSQL - soutions based on the latter are easy to find) whereby
> I can do a basic Linear Interpolation?
>
> Should you require any further information, please don't hesitate to contact 
> me.
>
> TIA and rgs,
>
>
> Pól...
>
>
> =
>
> My mega SQL:
>
> WITH cte1 AS
> (
>   SELECT d1.s,
> d1.t AS t1, d1.lat AS l1,
> LAG(d1.lat, 2)  OVER (ORDER BY t ASC) AS lag_t1_2,
> LAG(d1.lat, 1)  OVER (ORDER BY t ASC) AS lag_t1,
> LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1,
> LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2
>   FROM data d1
> ),
> cte2 AS
> (
>   SELECT
> d2.t AS t2, d2.lat AS l2,
> LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2,
> LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2
>   FROM data d2
> ),
> cte3 AS
> (
>   SELECT t1.s,
> t1.t1,  t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2,
> t1.lead_t1, t2.lead_t2, t1.lead_t1_2
>   FROM cte1 t1
>   JOIN cte2 t2
>   ON t1.t1 = t2.t2
> ),
> cte4 AS
> (
>   SELECT t1.s,
>   t1.l1 AS lat,
> CASE
>
>   -- The WHEN for the middle of 3 NULLs has to be at the beginning
>   -- of the CASE - if at the end, it remains NULL - why?
>
>   WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL)
> AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL)
>   THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2)
>
>   WHEN (t1.l1 IS NOT NULL) THEN t1.l1
>   WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL)
> AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2)
>   WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL)
> AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2)
>   WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL)  AND (t1.lag_t1 IS NULL)
> AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2)
>   ELSE 0
> END AS final_val
>   FROM cte3 t1
> )
> SELECT s, lat, final_val FROM cte4;




Interpolatioin problem - pg 12.4

2020-08-19 Thread Pól Ua Laoínecháin
Hi all,

I have an interpolation problem as follows - fiddle available here:

https://dbfiddle.uk/?rdbms=postgres_12=8d23925146ea11a904c454709b0026fd

A table:

CREATE TABLE data
(
  s SERIAL PRIMARY KEY,
  t TIMESTAMP,
  lat NUMERIC
);

and data:


INSERT INTO data (t, lat)
VALUES
('2019-01-01 00:00:00', 5.07),
('2019-01-01 01:00:00', 4.60),
('2019-01-01 02:00:00', NULL),
('2019-01-01 03:00:00', NULL),
('2019-01-01 04:00:00', 4.7),
('2019-01-01 05:00:00', 4.20),
('2019-01-01 06:00:00', NULL),
('2019-01-01 07:00:00', 4.98),
('2019-01-01 08:00:00', 4.50),
('2019-01-01 09:00:00', 4.7),
('2019-01-01 10:00:00', NULL),
('2019-01-01 11:00:00', NULL),
('2019-01-01 12:00:00', NULL),
('2019-01-01 13:00:00', 6.45),
('2019-01-01 14:00:00', 3.50);


There are gaps in the data as you can see - I'm trying to fill them
using the algorithm:

 - a sequence of 1 NULL - take the average of the reading above and
the reading below

- a sequence of 2 NULLs - the top assigned value is the average of the
two records above it and the bottom assigned one is the average of the
two records below.

So far, so good - I'm able to do this (but see discussion below)

 - a sequence of 3 NULLs - the middle one is assigned a value equal to
average of the non-NULL record above and the non-null record below,
and then the remaining NULLs above and below the average of the middle
one and the non-NULL ones above and below.

This is where it gets tricky - I'm getting answers, but I don't think
they're correct. The result of the massive  SQL shown below are here
(also on fiddle):

slat final_val
15.07 5.07
24.60 4.60
3NULL 4.84
4NULL 4.45
5 4.7  4.7
64.20 4.20
7NULL 4.59
84.98 4.98
94.50 4.50
104.7  4.7
11   NULL 4.60
12   NULL 5.58
13   NULL 4.98
14   6.45 6.45
15   3.50 3.50

The value for record 12 is correct, ,but not those above and below it.

I think my *MAJOR* problem is that I've developed what is,
essentially, a totally brute force approach - and this simply won't
work at the scenario becomes more complex - take a look at the CASE
statement - it's horrible and would only become exponentially worse as
the number NULLs rises.

So, my question is: Is there a recognised technique (using SQL only,
not PL/pgSQL - soutions based on the latter are easy to find) whereby
I can do a basic Linear Interpolation?

Should you require any further information, please don't hesitate to contact me.

TIA and rgs,


Pól...


=

My mega SQL:

WITH cte1 AS
(
  SELECT d1.s,
d1.t AS t1, d1.lat AS l1,
LAG(d1.lat, 2)  OVER (ORDER BY t ASC) AS lag_t1_2,
LAG(d1.lat, 1)  OVER (ORDER BY t ASC) AS lag_t1,
LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1,
LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2
  FROM data d1
),
cte2 AS
(
  SELECT
d2.t AS t2, d2.lat AS l2,
LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2,
LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2
  FROM data d2
),
cte3 AS
(
  SELECT t1.s,
t1.t1,  t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2,
t1.lead_t1, t2.lead_t2, t1.lead_t1_2
  FROM cte1 t1
  JOIN cte2 t2
  ON t1.t1 = t2.t2
),
cte4 AS
(
  SELECT t1.s,
  t1.l1 AS lat,
CASE

  -- The WHEN for the middle of 3 NULLs has to be at the beginning
  -- of the CASE - if at the end, it remains NULL - why?

  WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL)
AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL)
  THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2)

  WHEN (t1.l1 IS NOT NULL) THEN t1.l1
  WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL)
AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2)
  WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL)
AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2)
  WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL)  AND (t1.lag_t1 IS NULL)
AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2)
  ELSE 0
END AS final_val
  FROM cte3 t1
)
SELECT s, lat, final_val FROM cte4;




Re: Inline count on a query

2020-08-19 Thread David G. Johnston
On Wed, Aug 19, 2020 at 8:19 AM Laura Smith <
n5d9xq3ti233xiyif...@protonmail.ch> wrote:

> On Wednesday, 19 August 2020 15:09, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
> > On Wednesday, August 19, 2020, Laura Smith <
> n5d9xq3ti233xiyif...@protonmail.ch> wrote:
> >
> > > Hi,
> > >
> > > Let's say we've got a fairly basic table :
> > >
> > > create table networks (
> > > lan_id text not null,
> > > net_id text not null,
> > > port_id text not null
> > > );
> > > create index net_uniq on networks(lan_id,port_id);
> > >
> > > The query conundrum I am facing is that I need to add metadata to the
> output of the query that indicates the count of ports a given net has on a
> lan.
> > >
> > > So, for example, given :
> > > insert into networks(lan_id,net_id,port_id) values('L1','N1,'P1');
> > > insert into networks(lan_id,net_id,port_id) values('L1','N1,'P2');
> > >
> > > The metadata count would be 1, 2 (because 'N1' has 'P1' and 'P2' on
> 'L1').
> > >
> > > Is there a sensible way to query this without stressing out Postgres
> too much ?  I'm guessing a CTE of some sort ?
> >
> > Suggest you provide your desired output in table format, and show “the
> query” that you mention.
> >
> > David J.
>
>
> If I knew what "the query" was, I wouldn't be posting here.  ;-p
>

You implied that there was some existing query to which you are trying to
add metadata.

The only thing I've managed to come up with so far is adding count(*) over
> (partition by digest(lan_id||net_id,'sha256')) to my query, but that
> obviously gives the total count, not the ongoing incremental count.
>

If you want order to matter you need to add an ORDER BY to the window
specification, probably will the ROW * PRECEDING * FOLLOWING modifier as
well.  Though there is nothing in your original formulation that suggests
you cared about an "ongoing incremental count" so we're back to my
insistence you better formulate your problem statement and/or actually
provide the output needed for a given set of inputs even if you cannot put
together a working query that at least gets you close to that output.

David J.


Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)

2020-08-19 Thread Tom Lane
=?utf-8?Q?Adam_Sj=C3=B8gren?=  writes:
> Tom writes:
>> * Adam Sjøgren (a...@koldfront.dk) wrote:
>>> Sometimes new database logins slow down, from usually taking <0.05s to
>>> taking minutes. This is for psql as a normal user using Kerberos, for
>>> psql as the postgres superuser, for the web-application logging into the
>>> database, for everything.

>> Also, obtaining stack traces from a few of the stuck processes would
>> likely be quite informative.

> I will try to look at pg_stat_activity and pg_locks the next time - it
> just happened now and I focused on getting some stacktraces -
> unfortunately I do not have the -dbg package installed, so I don't know
> how useful they are.

Frequently, stack traces taken without debug symbols are pretty useless,
but you seem to have got lucky.  It seems clear that the unexpected wait
is here:

>   (gdb) bt
>   #0  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
> abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at 
> ../sysdeps/unix/sysv/linux/futex-internal.h:205
>   #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f1d38, abstime=0x0) at 
> sem_waitcommon.c:111
>   #2  0x7f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f1d38, 
> abstime=0x0) at sem_waitcommon.c:181
>   #3  0x7f8272cf297a in __new_sem_wait (sem=) at 
> sem_wait.c:29
>   #4  0x55f60d36c252 in PGSemaphoreLock ()
>   #5  0x55f60d3e35cc in LWLockAcquire ()
>   #6  0x55f60d3d14ee in ProcArrayAdd ()
>   #7  0x55f60d3e01a3 in InitProcessPhase2 ()
>   #8  0x55f60d51c99b in InitPostgres ()
>   #9  0x55f60d3f2baf in PostgresMain ()
>   #10 0x55f60d102bf8 in ?? ()
>   #11 0x55f60d37e492 in PostmasterMain ()
>   #12 0x55f60d103fa5 in main ()

which implies that the problem is unexpectedly high contention for the
ProcArrayLock.  It's unsurprising that incoming processes would need
that lock, since they need to add themselves to the ProcArray.  What's
not so apparent is why there is so much activity as to block them
from doing so for a significant amount of time.  We can guess that
the "worker processes" you mentioned are ganging up on that lock
because they need to compute query snapshots, but that still doesn't
seem like enough of an explanation.

Can you show us exactly what the "worker processes" are doing while
this is going on?

regards, tom lane




Re: Inline count on a query

2020-08-19 Thread Laura Smith
On Wednesday, 19 August 2020 15:09, David G. Johnston 
 wrote:

> On Wednesday, August 19, 2020, Laura Smith 
>  wrote:
>
> > Hi,
> >
> > Let's say we've got a fairly basic table :
> >
> > create table networks (
> > lan_id text not null,
> > net_id text not null,
> > port_id text not null
> > );
> > create index net_uniq on networks(lan_id,port_id);
> >
> > The query conundrum I am facing is that I need to add metadata to the 
> > output of the query that indicates the count of ports a given net has on a 
> > lan.
> >
> > So, for example, given :
> > insert into networks(lan_id,net_id,port_id) values('L1','N1,'P1');
> > insert into networks(lan_id,net_id,port_id) values('L1','N1,'P2');
> >
> > The metadata count would be 1, 2 (because 'N1' has 'P1' and 'P2' on 'L1').
> >
> > Is there a sensible way to query this without stressing out Postgres too 
> > much ?  I'm guessing a CTE of some sort ?
>
> Suggest you provide your desired output in table format, and show “the query” 
> that you mention.
>
> David J.


If I knew what "the query" was, I wouldn't be posting here.  ;-p

The only thing I've managed to come up with so far is adding count(*) over 
(partition by digest(lan_id||net_id,'sha256')) to my query, but that obviously 
gives the total count, not the ongoing incremental count.




Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)

2020-08-19 Thread Adam Sjøgren
Adam writes:

> I will try to look at pg_stat_activity and pg_locks the next time

I did not have to wait long, so I have captured the output of
pg_stat_activity and pg_locks when everything is fine, by just doing
"time sudo -u postgres PAGER=cat psql template1 -c 'SELECT * FROM
pg_stat_activity'; time sudo -u postgres PAGER=cat psql template1 -c
'SELECT * FROM pg_locks'":

 · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-OK.txt.gz

And while it was bad, and I shut down the workers (notice how the first
psql took 23s compared to 0.112s when everything was fine):

 · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-BAD.txt.gz

(I have "sanitized" both files for usernames, application names,
IP-addresses and queries.)

I also attached gdb to one of the processes and got a backtrace, gdb
continued, pressed ^C after ~5s, got a backtrace and kept doing that for
a while, this is the result:

  postgres  79572  0.0  0.0 34686088 8412 ?   Ss   15:15   0:00  \_ 
postgres: 11/main: workqueue XXdb 127.0.0.1(53126) authentication
  postgres  79573  0.0  0.0 34686088 6476 ?   Ss   15:15   0:00  \_ 
postgres: 11/main: anonymous XXdb 127.0.0.1(53128) authentication
  $ sudo gdb -p 79572
  GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
 [...]
  Attaching to process 79572
 [...]
  Reading symbols from /usr/lib/postgresql/11/bin/postgres...(no debugging 
symbols found)...done.
 [...]
  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f6d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  205   ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
  (gdb) bt
  #0  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f6d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f6d38, abstime=0x0) at 
sem_waitcommon.c:111
  #2  0x7f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f6d38, 
abstime=0x0) at sem_waitcommon.c:181
  #3  0x7f8272cf297a in __new_sem_wait (sem=) at 
sem_wait.c:29
  #4  0x55f60d36c252 in PGSemaphoreLock ()
  #5  0x55f60d3e35cc in LWLockAcquire ()
  #6  0x55f60d3d14ee in ProcArrayAdd ()
  #7  0x55f60d3e01a3 in InitProcessPhase2 ()
  #8  0x55f60d51c99b in InitPostgres ()
  #9  0x55f60d3f2baf in PostgresMain ()
  #10 0x55f60d102bf8 in ?? ()
  #11 0x55f60d37e492 in PostmasterMain ()
  #12 0x55f60d103fa5 in main ()
  (gdb) cont
  Continuing.
  ^C
  Program received signal SIGINT, Interrupt.
  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f6d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  205   in ../sysdeps/unix/sysv/linux/futex-internal.h
  (gdb) bt
  #0  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f6d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f6d38, abstime=0x0) at 
sem_waitcommon.c:111
  #2  0x7f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f6d38, 
abstime=0x0) at sem_waitcommon.c:181
  #3  0x7f8272cf297a in __new_sem_wait (sem=) at 
sem_wait.c:29
  #4  0x55f60d36c252 in PGSemaphoreLock ()
  #5  0x55f60d3e35cc in LWLockAcquire ()
  #6  0x55f60d3d14ee in ProcArrayAdd ()
  #7  0x55f60d3e01a3 in InitProcessPhase2 ()
  #8  0x55f60d51c99b in InitPostgres ()
  #9  0x55f60d3f2baf in PostgresMain ()
  #10 0x55f60d102bf8 in ?? ()
  #11 0x55f60d37e492 in PostmasterMain ()
  #12 0x55f60d103fa5 in main ()
  (gdb) cont
  Continuing.
  ^C
  Program received signal SIGINT, Interrupt.
  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f6d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  205   in ../sysdeps/unix/sysv/linux/futex-internal.h
  (gdb) bt
  #0  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f6d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f6d38, abstime=0x0) at 
sem_waitcommon.c:111
  #2  0x7f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f6d38, 
abstime=0x0) at sem_waitcommon.c:181
  #3  0x7f8272cf297a in __new_sem_wait (sem=) at 
sem_wait.c:29
  #4  0x55f60d36c252 in PGSemaphoreLock ()
  #5  0x55f60d3e35cc in LWLockAcquire ()
  #6  0x55f60d3d14ee in ProcArrayAdd ()
  #7  0x55f60d3e01a3 in InitProcessPhase2 ()
  #8  0x55f60d51c99b in InitPostgres ()
  #9  0x55f60d3f2baf in PostgresMain ()
  #10 0x55f60d102bf8 in ?? ()
  #11 0x55f60d37e492 in PostmasterMain ()
  #12 0x55f60d103fa5 in main ()
  (gdb) cont
  Continuing.
  ^C
  Program received signal SIGINT, Interrupt.
  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, 

Re: Inline count on a query

2020-08-19 Thread David G. Johnston
On Wednesday, August 19, 2020, Laura Smith <
n5d9xq3ti233xiyif...@protonmail.ch> wrote:

> Hi,
>
> Let's say we've got a fairly basic table :
>
> create table networks (
> lan_id text not null,
> net_id text not null,
> port_id text not null
> );
> create index net_uniq on networks(lan_id,port_id);
>
> The query conundrum I am facing is that I need to add metadata to the
> output of the query that indicates the count of ports a given net has on a
> lan.
>
> So, for example, given :
> insert into networks(lan_id,net_id,port_id) values('L1','N1,'P1');
> insert into networks(lan_id,net_id,port_id) values('L1','N1,'P2');
>
> The metadata count would be 1, 2 (because 'N1' has 'P1' and 'P2' on 'L1').
>
> Is there a sensible way to query this without stressing out Postgres too
> much ?  I'm guessing a CTE of some sort ?
>
>
Suggest you provide your desired output in table format, and show “the
query” that you mention.

David J.


Inline count on a query

2020-08-19 Thread Laura Smith
Hi,

Let's say we've got a fairly basic table :

create table networks (
lan_id text not null,
net_id text not null,
port_id text not null
);
create index net_uniq on networks(lan_id,port_id);

The query conundrum I am facing is that I need to add metadata to the output of 
the query that indicates the count of ports a given net has on a lan.

So, for example, given :
insert into networks(lan_id,net_id,port_id) values('L1','N1,'P1');
insert into networks(lan_id,net_id,port_id) values('L1','N1,'P2');

The metadata count would be 1, 2 (because 'N1' has 'P1' and 'P2' on 'L1').

Is there a sensible way to query this without stressing out Postgres too much ? 
 I'm guessing a CTE of some sort ?

Laura




Re: Database logins taking longer and longer, showing up as "authentication" in ps(1)

2020-08-19 Thread Adam Sjøgren
Tom writes:

> Stephen Frost  writes:
>> * Adam Sjøgren (a...@koldfront.dk) wrote:
>>> Sometimes new database logins slow down, from usually taking <0.05s to
>>> taking minutes. This is for psql as a normal user using Kerberos, for
>>> psql as the postgres superuser, for the web-application logging into the
>>> database, for everything.
>
> I think looking into pg_stat_activity and pg_locks might be interesting,
> although it's possible the delay is before the new session has made
> any entries there.
>
> Also, obtaining stack traces from a few of the stuck processes would
> likely be quite informative.

I will try to look at pg_stat_activity and pg_locks the next time - it
just happened now and I focused on getting some stacktraces -
unfortunately I do not have the -dbg package installed, so I don't know
how useful they are.

Here's the first one:

  $ ps faux | grep postgres | grep authentication
  postgres  16697  0.0  0.0 34686088 8352 ?   Ss   14:45   0:00  \_ 
postgres: 11/main: workqueue XXdb 127.0.0.1(42534) authentication
  postgres  16701  0.0  0.0 34686088 6476 ?   Ss   14:45   0:00  \_ 
postgres: 11/main: anonymous XXdb 127.0.0.1(42536) authentication
  postgres  16702  0.0  0.0 34686088 8348 ?   Ss   14:45   0:00  \_ 
postgres: 11/main: svc_user XXdb 127.0.0.1(42540) authentication
  postgres  16704  0.0  0.0 34686040 5376 ?   Ss   14:45   0:00  \_ 
postgres: 11/main: postgres postgres [local] authentication
  postgres  16705  0.0  0.0 34686088 8352 ?   Ss   14:45   0:00  \_ 
postgres: 11/main: svc_user XXdb 127.0.0.1(42542) authentication
  postgres  16708  0.0  0.0 34686088 6476 ?   Ss   14:45   0:00  \_ 
postgres: 11/main: svc_user XXdb 127.0.0.1(42544) authentication
  $ sudo gdb -p  16708
  GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
[...]
  Attaching to process 16708
  Reading symbols from /usr/lib/postgresql/11/bin/postgres...(no debugging 
symbols found)...done.
[...]
  0x7f8270a039f3 in delete_module () at 
../sysdeps/unix/syscall-template.S:86
  86../sysdeps/unix/syscall-template.S: No such file or directory.
  (gdb) bt
  #0  0x7f8270a039f3 in delete_module () at 
../sysdeps/unix/syscall-template.S:86
  #1  0x55f60d3d0101 in WaitEventSetWait ()
  #2  0x55f60d2dccff in secure_read ()
  #3  0x55f60d2e9138 in ?? ()
  #4  0x55f60d2e9e75 in pq_getbyte ()
  #5  0x55f60d3f3100 in PostgresMain ()
  #6  0x55f60d102bf8 in ?? ()
  #7  0x55f60d37e492 in PostmasterMain ()
  #8  0x55f60d103fa5 in main ()
  (gdb) 

I think I was too slow on this one, and this is what it looks like if
the process is just waiting for a connection (as you can guess, I'm not
used to looking at backtraces).

Here is another one, which hopefully is caught at a better moment, and
more useful:

  postgres  19495  0.0  0.0 34686088 8412 ?   Ss   14:48   0:00  \_ 
postgres: 11/main: workqueue XXdb 127.0.0.1(43390) authentication
  postgres  19497  0.0  0.0 34686088 6476 ?   Ss   14:48   0:00  \_ 
postgres: 11/main: XXX XXdb 127.0.0.1(43392) authentication
  postgres  19502  0.0  0.0 34686088 8412 ?   Ss   14:48   0:00  \_ 
postgres: 11/main: XXX XXdb 127.0.0.1(43394) authentication
  $ sudo gdb -p  19495
  GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
[...]
  Attaching to process 19495
[...]
  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  205   ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
  (gdb) bt
  #0  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f1d38, abstime=0x0) at 
sem_waitcommon.c:111
  #2  0x7f8272cf28d4 in __new_sem_wait_slow (sem=0x7f7a2e9f1d38, 
abstime=0x0) at sem_waitcommon.c:181
  #3  0x7f8272cf297a in __new_sem_wait (sem=) at 
sem_wait.c:29
  #4  0x55f60d36c252 in PGSemaphoreLock ()
  #5  0x55f60d3e35cc in LWLockAcquire ()
  #6  0x55f60d3d14ee in ProcArrayAdd ()
  #7  0x55f60d3e01a3 in InitProcessPhase2 ()
  #8  0x55f60d51c99b in InitPostgres ()
  #9  0x55f60d3f2baf in PostgresMain ()
  #10 0x55f60d102bf8 in ?? ()
  #11 0x55f60d37e492 in PostmasterMain ()
  #12 0x55f60d103fa5 in main ()
  (gdb) cont
  Continuing.
  ^C
  Program received signal SIGINT, Interrupt.
  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  205   in ../sysdeps/unix/sysv/linux/futex-internal.h
  (gdb) bt
  #0  0x7f8272cf2827 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f7a2e9f1d38) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  #1  do_futex_wait (sem=sem@entry=0x7f7a2e9f1d38, abstime=0x0) at 

Re: BUG? Slave don't reconnect to the master

2020-08-19 Thread Jehan-Guillaume de Rorthais
On Tue, 18 Aug 2020 13:48:41 +0300
Олег Самойлов  wrote:

> Hi all.
> 
> I found some strange behaviour of postgres, which I recognise as a bug. First
> of all, let me explain situation.
> 
> I created a "test bed" (not sure how to call it right), to test high
> availability clusters based on Pacemaker and PostgreSQL. The test bed consist
> of 12 virtual machines (on VirtualBox) runing on a MacBook Pro and formed 4
> HA clusters with different structure. And all 4 HA cluster constantly tested
> in loop: simulated failures with different nature, waited for rising
> fall-over, fixing, and so on. For simplicity I'll explain only one HA
> cluster.
> This is 3 virtual machines, with master on one, and sync and async
> slaves on other. The PostgreSQL service is provided by float IPs pointed to
> working master and slaves. Slaves are connected to the master float IP too.
> When the pacemaker detects a failure, for instance, on the master, it promote
> a master on other node with lowest latency WAL and switches float IPs, so the
> third node keeping be a sync slave. My company decided to open this project
> as an open source, now I am finishing formality.

As the maintainer of PAF[1], I'm looking forward to discover it :)
Do not hesitate to ping me offlist as well in regard with Pacemaker and
resource agents.

> Almost works fine, but sometimes, rather rare, I detected that a slave don't
> reconnect to the new master after a failure. First case is PostgreSQL-STOP,
> when I `kill` by STOP signal postgres on the master to simulate freeze. The
> slave don't reconnect to the new master with errors in log:
> 
> 18:02:56.236 [3154] FATAL:  terminating walreceiver due to timeout
> 18:02:56.237 [1421] LOG:  record with incorrect prev-link 0/1600DDE8 at
> 0/1A00DE10

Do you have more logs from both side of the replication?
How do you build your standbys?

> What is strange that error about incorrect WAL is risen  after the
> termination of connection.

This is because the first message comes from the walreceiver itself (3154),
which receive and write WAL, and the other one comes from the startup process
(1421) which wait and replay WAL.

> Well, this can be workarouned by turning off wal
> receiver timeout. Now PostgreSQL-STOP works fine, but the problem is still
> exists with other test. ForkBomb simulates an out of memory situation. In
> this case a slave sometimes don't reconnect to the new master too, with
> errors in log:
> 
> 10:09:43.99 [1417] FATAL:  could not receive data from WAL stream: server
> closed the connection unexpectedly This probably means the server terminated
> abnormally before or while processing the request.
> 10:09:43.992 [1413] LOG:  invalid record length at 0/D8014278: wanted 24, got
> 0

I suspect the problem is somewhere else. The first message here is probably
related to your primary being fenced, the second one is normal. After your
IP moved to the recently promoted primary, your standby are supposed to
reconnect with no problem.

> The last error message (last row in log) was observed different, btw.
> 
> What I expect as right behaviour. The PostgreSQL slave must reconnect to the
> master IP (float IP) after the wal_retrieve_retry_interval.

In my own experience with PAF, it just works like what you describe.


Regards,

[1] https://clusterlabs.github.io/PAF/