Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread John W Higgins
On Fri, Mar 20, 2020 at 8:13 PM pabloa98  wrote:

>
> I hope I described the problem completely.
>
>
1) What is a group - does it exist prior to records being inserted? How
many groups are you working with? How long do they live for?
2) #1 but for element
3) How many records per second per group per element are you anticipating?
Are their spikes?
4) How long does a group/element live for? You can put 10 record per second
over 3 years and still be under 100 million records (the size of your
seq_number)
5) How quickly do you need the seq_number - if you batch created them once
a minute would that work? Does it need to be quicker than that? Slower than
that? Or is it an immediate need?

That's a starting point.

John W Higgins


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread David G. Johnston
On Friday, March 20, 2020, pabloa98  wrote:

>
> If there is another solution different than millions of sequences that do
>> not block, generate few gaps (and those gaps are small) and never generate
>> collisions then I will happily use it.
>
>
You are going to have to lose the not blocking requirement, or at least
define what blocking and non-blocking interactions look like.

What does it mean that bid# 123 exists?  Can you, by way of example, just
create a table of all possible bid numbers and update a null column with a
timestamp saying “this bid came into existence at this moment”?

David J.


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread pabloa98
> Nothing I saw that said int could not become bigint.
>
>
> My bad. The code cannot be a bigint. Or it could be a bigint between 1 to
 :)
I thought it was not important. The code could be a number from 1 to
 (so an Int will be OK) assigned in order-ish. This is because of
business requirements. The code should not change in case a row is deleted.
That rules out using windowed functions. At least for this problem. There
could be some gaps if they are not too big (similar to a serial column when
a transaction is rolled back). We are more concerned about code generation
collisions (for example 2 transactions calculating the same code value)
than gaps. For each pair (group, element) the code, once assigned should
remain constant. This is because the group, the element, and the code will
become an id (business-related). So:

group, element, code = bid
1, 3, 1 = bid 131
2, 1, 1 = bid 211
etc

This calculated bid follows the convention described here and it is used as
a reference in other tables. Therefore it should not change.
All this weirdness is because of business requirements. I am good with a
classic serial column. But that will not map our business rules.

Regarding to the implementation of this. Our concern is to choose something
not slow (it does not need to be super fast because there will be more
operations in other tables using the same transaction) and I thought that
creating/dropping sequences could be a solution. But I was not sure. I am
not sure how it will behave with millions of sequences.

If there is another solution different than millions of sequences that do
not block, generate few gaps (and those gaps are small) and never generate
collisions then I will happily use it.

I hope I described the problem completely.

Pablo


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread pabloa98
On Fri, Mar 20, 2020 at 3:59 PM Peter J. Holzer  wrote:

> On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> > First, it sounds like you care about there being no gaps in the records
> you end
> > up saving.  If that is the case then sequences will not work for you.
>
> I think (but I would love to be proven wrong), that *nothing* will work
> reliably, if
>
> 1) you need gapless numbers which are strictly allocated in sequence
>
A little gap is acceptable. We cannot afford a 100 gap though.

2) you have transactions
> 3) you don't want to block
>
> Rationale:
>
> Regardless of how you get the next number, the following scenario is
> always possible:
>
> Session1: get next number
> Session2: get next nummber
> Session1: rollback
> Session2: commit
>
> At this point you have a gap.
>
> If you can afford to block, I think a simple approach like
>
> create table s(id int, counter int);
> ...
> begin;
> ...
> update s set counter = counter + 1 where id = $whatever returning
> counter;
> -- use counter
> commit;
>
> should work. But that effectively serializes your transactions and may
> cause some to be aborted to prevent deadlocks.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Adrian Klaver

On 3/20/20 2:13 PM, pabloa98 wrote:



On Fri, Mar 20, 2020 at 10:26 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 3/20/20 9:59 AM, Adrian Klaver wrote:
 > On 3/19/20 10:31 PM, pabloa98 wrote:
 >> I see.
 >>
 >> Any suggestion? It should behave like a sequence in the sense that
 >> concurrent transitions will get different numbers from this
 >> alternative sequence like solution.
 >>
 >> In our case, we will need to do a call nextval('some_seq') (or
 >> similar) from different processes no more than twice every minute.
 >>
 >>
 >> It would be nice to have a sequence data type. Correct me if I am
 >> wrong, but It seems to me that a sequence data type would cost the
 >> same or less than the current sequence implementation.
 >>
 >> The schema would be more clear too. We could have a table like:
 >>
 >> CREATE TABLE pair(
 >> group INT NOT NULL,
 >> element INT NOT NULL,
 >> seq SEQUENCE INCREMENT 1
 >>                START 1
 >>                CACHE 1
 >>                MINVALUE 1
 >>                MAXVALUE 
 >>  NOT NULL,
 >> CONSTRAINT PRIMARY KEY (group, element)
 >> );
 >>
 >> And then:
 >>
 >> INSERT INTO event(group, element, code)
 >> VALUES (
 >>   1,
 >>   1,
 >>   ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE
 >> p.group=1 and p.code=1 )
 >> );
 >>
 >> Or perhaps storing all the sequences in the same table as rows will
 >> have the same behavior.
 >
 > If code is just something to show the sequence of insertion for
group,
 > element combinations then maybe something like below:
 >
 > CREATE TABLE event(
 > group_id INT NOT NULL, --changed as group is reserved word
 > element INT NOT NULL,
 > insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
 > PRIMARY KEY(group_id, element, insert_ts)
 > );
 >
 >
 > insert into event(group_id, element) VALUES
 > (1, 1),
 > (1, 1),
 > (1, 1),
 > (2, 1),
 > (1, 1),
 > (1, 3),
 > (1, 1),
 > (1, 3),
 > (2, 1),
 > (2, 1);
 >
 >
 > select * from event ;
 >   group_id | element |   insert_ts
 > --+-+
 >      1 |   1 | 03/20/2020 09:51:12.675926 PDT
 >      1 |   1 | 03/20/2020 09:51:12.675985 PDT
 >      1 |   1 | 03/20/2020 09:51:12.675991 PDT
 >      2 |   1 | 03/20/2020 09:51:12.675996 PDT
 >      1 |   1 | 03/20/2020 09:51:12.676 PDT
 >      1 |   3 | 03/20/2020 09:51:12.676004 PDT
 >      1 |   1 | 03/20/2020 09:51:12.676008 PDT
 >      1 |   3 | 03/20/2020 09:51:12.676012 PDT
 >      2 |   1 | 03/20/2020 09:51:12.676016 PDT
 >      2 |   1 | 03/20/2020 09:51:12.67602 PDT
 > (10 rows)
 >
 >
 > select group_id, element, row_number() OVER (partition by (group_id,
 > element) order by (group_id, element)) AS code from event;
 >   group_id | element | code
 > --+-+--
 >      1 |   1 |    1
 >      1 |   1 |    2
 >      1 |   1 |    3
 >      1 |   1 |    4
 >      1 |   1 |    5
 >      1 |   3 |    1
 >      1 |   3 |    2
 >      2 |   1 |    1
 >      2 |   1 |    2
 >      2 |   1 |    3
 > (10 rows)

Oops the above actually returned:

select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element)) AS code, insert_ts from event;
   group_id | element | code |           insert_ts
--+-+--+
          1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
          1 |       1 |    2 | 03/20/2020 09:51:12.675985 PDT
          1 |       1 |    3 | 03/20/2020 09:51:12.675991 PDT
          1 |       1 |    4 | 03/20/2020 09:51:12.676008 PDT
          1 |       1 |    5 | 03/20/2020 09:51:12.676 PDT
          1 |       3 |    1 | 03/20/2020 09:51:12.676004 PDT
          1 |       3 |    2 | 03/20/2020 09:51:12.676012 PDT
          2 |       1 |    1 | 03/20/2020 09:51:12.67602 PDT
          2 |       1 |    2 | 03/20/2020 09:51:12.676016 PDT
          2 |       1 |    3 | 03/20/2020 09:51:12.675996 PDT
(10 rows)


Needs to be:

select group_id, element, row_number() OVER (partition by (group_id,
element) order by (group_id, element, insert_ts)) AS code, insert_ts
from event;
   group_id | element | code |           insert_ts
--+-+--+
          1 |       1 |    1 | 03/20/2020 09:51:12.675926 PDT
          1 |       1 |    2 | 03/20/2020 

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Adrian Klaver

On 3/20/20 4:29 PM, Peter J. Holzer wrote:

On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:

On Mar 20, 2020, at 4:59 PM, Peter J. Holzer  wrote:

On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:

First, it sounds like you care about there being no gaps in the records you end
up saving.  If that is the case then sequences will not work for you.


I think (but I would love to be proven wrong), that *nothing* will work
reliably, if

1) you need gapless numbers which are strictly allocated in sequence
2) you have transactions
3) you don't want to block

Rationale:

Regardless of how you get the next number, the following scenario is
always possible:

[...]

At this point you have a gap.

If you can afford to block, I think a simple approach like

[...]

should work. But that effectively serializes your transactions and may
cause some to be aborted to prevent deadlocks.


OP  has said small gaps are ok.


Yes. This wasn't a response to the OP's requirements, but to David's
(rather knee-jerk, IMHO) "don't use sequences" response. Very often the
requirements which would preclude sequences also preclude any other
solution.


I don't see  a knee-jerk reaction in this:

https://www.postgresql.org/message-id/CAKFQuwZ%3D%3Dri5_m2geFA-GPOdfnVggmJRu3zEi%2B1EwJdJA%3D9AeQ%40mail.gmail.com

The response was if you cared about gaps(not something the OP had 
specified at that point) then a sequence would not work. If not then 
they where something that would need testing to determine suitability.




(In the case of the OP's problem, I'd agree that sequences are probably
a bad idea for the reasons he anticipates)


To me that says the requirement


Which requirement? The OP's or the one I posed here?


is capricious but we haven’t heard the rationale for the requirement
yet (or I missed it)


The requirement is that (group, element) pairs have a sequence 
number/code so:


1,1,1
1,1,2
1,1,3
2,2,1
2,2,2



The OP gave a rationale: He has to fit the counter into an 8-digit
field, and a global counter would overflow that. So he needs per-element
counters.


I must have missed that post. There was this(and alternates):

CREATE TABLE counter(
group INT NOT NULL,
element INT NOT NULL,
seq_number INT NOT NULL default 0,
CONSTRAINT PRIMARY KEY (group, element)
);

Nothing I saw that said int could not become bigint.




 hp




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Rob Sargent



> On Mar 20, 2020, at 5:29 PM, Peter J. Holzer  wrote:
> 
> 
>> 
>> 
>> OP  has said small gaps are ok.
> 
> Yes. This wasn't a response to the OP's requirements, but to David's
> (rather knee-jerk, IMHO) "don't use sequences" response. Very often the
> requirements which would preclude sequences also preclude any other
> solution.
> 
> (In the case of the OP's problem, I'd agree that sequences are probably
> a bad idea for the reasons he anticipates)
> 
>> To me that says the requirement
> 
> Which requirement? The OP's or the one I posed here?
> 
>> is capricious but we haven’t heard the rationale for the requirement
>> yet (or I missed it)
> 
> The OP gave a rationale: He has to fit the counter into an 8-digit
> field, and a global counter would overflow that. So he needs per-element
> counters.
> 
>hp

Isn’t that the implementation of a req along the lines of “we want to number 
the occurrences of these pairs because ...”?
> 
> -- 
>   _  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Peter J. Holzer
On 2020-03-20 17:11:42 -0600, Rob Sargent wrote:
> On Mar 20, 2020, at 4:59 PM, Peter J. Holzer  wrote:
> > On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> >> First, it sounds like you care about there being no gaps in the records 
> >> you end
> >> up saving.  If that is the case then sequences will not work for you.
> > 
> > I think (but I would love to be proven wrong), that *nothing* will work
> > reliably, if
> > 
> > 1) you need gapless numbers which are strictly allocated in sequence
> > 2) you have transactions
> > 3) you don't want to block
> > 
> > Rationale:
> > 
> > Regardless of how you get the next number, the following scenario is
> > always possible:
[...]
> > At this point you have a gap.
> > 
> > If you can afford to block, I think a simple approach like
[...]
> > should work. But that effectively serializes your transactions and may
> > cause some to be aborted to prevent deadlocks.
> 
> OP  has said small gaps are ok.

Yes. This wasn't a response to the OP's requirements, but to David's
(rather knee-jerk, IMHO) "don't use sequences" response. Very often the
requirements which would preclude sequences also preclude any other
solution.

(In the case of the OP's problem, I'd agree that sequences are probably
a bad idea for the reasons he anticipates)

> To me that says the requirement

Which requirement? The OP's or the one I posed here?

> is capricious but we haven’t heard the rationale for the requirement
> yet (or I missed it)

The OP gave a rationale: He has to fit the counter into an 8-digit
field, and a global counter would overflow that. So he needs per-element
counters.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Rob Sargent


> On Mar 20, 2020, at 4:59 PM, Peter J. Holzer  wrote:
> 
> On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
>> First, it sounds like you care about there being no gaps in the records you 
>> end
>> up saving.  If that is the case then sequences will not work for you.
> 
> I think (but I would love to be proven wrong), that *nothing* will work
> reliably, if
> 
> 1) you need gapless numbers which are strictly allocated in sequence
> 2) you have transactions
> 3) you don't want to block
> 
> Rationale:
> 
> Regardless of how you get the next number, the following scenario is
> always possible:
> 
> Session1: get next number
> Session2: get next nummber
> Session1: rollback
> Session2: commit
> 
> At this point you have a gap.
> 
> If you can afford to block, I think a simple approach like
> 
>create table s(id int, counter int);
>...
>begin;
>...
>update s set counter = counter + 1 where id = $whatever returning counter;
>-- use counter
>commit;
> 
> should work. But that effectively serializes your transactions and may
> cause some to be aborted to prevent deadlocks.
> 
>hp

OP  has said small gaps are ok. To me that says the requirement is capricious 
but we haven’t heard the rationale for the requirement yet (or I missed it)

Aside: apologies for the empty message earlier





Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Peter J. Holzer
On 2020-03-19 16:48:19 -0700, David G. Johnston wrote:
> First, it sounds like you care about there being no gaps in the records you 
> end
> up saving.  If that is the case then sequences will not work for you.

I think (but I would love to be proven wrong), that *nothing* will work
reliably, if

1) you need gapless numbers which are strictly allocated in sequence
2) you have transactions
3) you don't want to block

Rationale:

Regardless of how you get the next number, the following scenario is
always possible:

Session1: get next number
Session2: get next nummber
Session1: rollback
Session2: commit

At this point you have a gap.

If you can afford to block, I think a simple approach like

create table s(id int, counter int);
...
begin;
...
update s set counter = counter + 1 where id = $whatever returning counter;
-- use counter
commit;

should work. But that effectively serializes your transactions and may
cause some to be aborted to prevent deadlocks.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread pabloa98
On Fri, Mar 20, 2020 at 10:26 AM Adrian Klaver 
wrote:

> On 3/20/20 9:59 AM, Adrian Klaver wrote:
> > On 3/19/20 10:31 PM, pabloa98 wrote:
> >> I see.
> >>
> >> Any suggestion? It should behave like a sequence in the sense that
> >> concurrent transitions will get different numbers from this
> >> alternative sequence like solution.
> >>
> >> In our case, we will need to do a call nextval('some_seq') (or
> >> similar) from different processes no more than twice every minute.
> >>
> >>
> >> It would be nice to have a sequence data type. Correct me if I am
> >> wrong, but It seems to me that a sequence data type would cost the
> >> same or less than the current sequence implementation.
> >>
> >> The schema would be more clear too. We could have a table like:
> >>
> >> CREATE TABLE pair(
> >> group INT NOT NULL,
> >> element INT NOT NULL,
> >> seq SEQUENCE INCREMENT 1
> >>START 1
> >>CACHE 1
> >>MINVALUE 1
> >>MAXVALUE 
> >>  NOT NULL,
> >> CONSTRAINT PRIMARY KEY (group, element)
> >> );
> >>
> >> And then:
> >>
> >> INSERT INTO event(group, element, code)
> >> VALUES (
> >>   1,
> >>   1,
> >>   ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE
> >> p.group=1 and p.code=1 )
> >> );
> >>
> >> Or perhaps storing all the sequences in the same table as rows will
> >> have the same behavior.
> >
> > If code is just something to show the sequence of insertion for group,
> > element combinations then maybe something like below:
> >
> > CREATE TABLE event(
> > group_id INT NOT NULL, --changed as group is reserved word
> > element INT NOT NULL,
> > insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
> > PRIMARY KEY(group_id, element, insert_ts)
> > );
> >
> >
> > insert into event(group_id, element) VALUES
> > (1, 1),
> > (1, 1),
> > (1, 1),
> > (2, 1),
> > (1, 1),
> > (1, 3),
> > (1, 1),
> > (1, 3),
> > (2, 1),
> > (2, 1);
> >
> >
> > select * from event ;
> >   group_id | element |   insert_ts
> > --+-+
> >  1 |   1 | 03/20/2020 09:51:12.675926 PDT
> >  1 |   1 | 03/20/2020 09:51:12.675985 PDT
> >  1 |   1 | 03/20/2020 09:51:12.675991 PDT
> >  2 |   1 | 03/20/2020 09:51:12.675996 PDT
> >  1 |   1 | 03/20/2020 09:51:12.676 PDT
> >  1 |   3 | 03/20/2020 09:51:12.676004 PDT
> >  1 |   1 | 03/20/2020 09:51:12.676008 PDT
> >  1 |   3 | 03/20/2020 09:51:12.676012 PDT
> >  2 |   1 | 03/20/2020 09:51:12.676016 PDT
> >  2 |   1 | 03/20/2020 09:51:12.67602 PDT
> > (10 rows)
> >
> >
> > select group_id, element, row_number() OVER (partition by (group_id,
> > element) order by (group_id, element)) AS code from event;
> >   group_id | element | code
> > --+-+--
> >  1 |   1 |1
> >  1 |   1 |2
> >  1 |   1 |3
> >  1 |   1 |4
> >  1 |   1 |5
> >  1 |   3 |1
> >  1 |   3 |2
> >  2 |   1 |1
> >  2 |   1 |2
> >  2 |   1 |3
> > (10 rows)
>
> Oops the above actually returned:
>
> select group_id, element, row_number() OVER (partition by (group_id,
> element) order by (group_id, element)) AS code, insert_ts from event;
>   group_id | element | code |   insert_ts
> --+-+--+
>  1 |   1 |1 | 03/20/2020 09:51:12.675926 PDT
>  1 |   1 |2 | 03/20/2020 09:51:12.675985 PDT
>  1 |   1 |3 | 03/20/2020 09:51:12.675991 PDT
>  1 |   1 |4 | 03/20/2020 09:51:12.676008 PDT
>  1 |   1 |5 | 03/20/2020 09:51:12.676 PDT
>  1 |   3 |1 | 03/20/2020 09:51:12.676004 PDT
>  1 |   3 |2 | 03/20/2020 09:51:12.676012 PDT
>  2 |   1 |1 | 03/20/2020 09:51:12.67602 PDT
>  2 |   1 |2 | 03/20/2020 09:51:12.676016 PDT
>  2 |   1 |3 | 03/20/2020 09:51:12.675996 PDT
> (10 rows)
>
>
> Needs to be:
>
> select group_id, element, row_number() OVER (partition by (group_id,
> element) order by (group_id, element, insert_ts)) AS code, insert_ts
> from event;
>   group_id | element | code |   insert_ts
> --+-+--+
>  1 |   1 |1 | 03/20/2020 09:51:12.675926 PDT
>  1 |   1 |2 | 03/20/2020 09:51:12.675985 PDT
>  1 |   1 |3 | 03/20/2020 09:51:12.675991 PDT
>  1 |   1 |4 | 03/20/2020 09:51:12.676 PDT
>  1 |   1 |5 | 03/20/2020 09:51:12.676008 PDT
>  1 |   3 |1 | 03/20/2020 09:51:12.676004 PDT
>  1 |   3 |2 | 03/20/2020 09:51:12.676012 PDT
>  2 |   1 |1 | 03/20/2020 09:51:12.675996 PDT
>  2 |   1 |2 | 

Re: Duplicate key violation on upsert

2020-03-20 Thread Matt Magoffin

> On 21/03/2020, at 8:10 AM, Adrian Klaver  wrote:
> 
>> The _hyper_1_1931_chunk_da_datum_x_acc_idx index has the same definition as 
>> the da_datum_x_acc_idx above (it is defined on a child table). That is, they 
>> are both essentially:
>> UNIQUE, btree (node_id, source_id, ts DESC, jdata_a) WHERE jdata_a IS NOT 
>> NULL
>> The da_datum_pkey index is what the ON CONFLICT cause refers to, so 
>> (node_id, ts, source_id) is UNIQUE as well.
> 
> Hmm, wonder if you are getting bit by this?:
> 
> https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT 
> 
> 
> "INSERT with an ON CONFLICT DO UPDATE clause is a “deterministic” statement. 
> This means that the command will not be allowed to affect any single existing 
> row more than once; a cardinality violation error will be raised when this 
> situation arises. Rows proposed for insertion should not duplicate each other 
> in terms of attributes constrained by an arbiter index or constraint.”


I’m not sure I’m wrapping my head around this. The INSERT affects 1 row as the 
unique values (node_id, ts, source_id) are specified in the statement. Is it 
possible that da_datum_x_acc_idx is used as the arbiter index in this 
situation, rather than da_datum_pkey (that I intended), and you’re saying that 
the jdata_a column is getting updated twice, first in the INSERT and second in 
the DO UPDATE, triggering the duplicate key violation?

— m@



Re: PG12 autovac issues

2020-03-20 Thread Michael Lewis
>
> We haven't isolated *which* table it is blocked on (assuming it is),
> but all autovac's cease running until we manually intervene.
>
> When we get into this state again, is there some other information
> (other than what is in pg_stat_statement or pg_stat_activity) that
> would be useful for folks here to help understand what is going on?


https://www.postgresql.org/docs/current/progress-reporting.html#VACUUM-PROGRESS-REPORTING


Re: Duplicate key violation on upsert

2020-03-20 Thread Adrian Klaver

On 3/20/20 11:34 AM, Matt Magoffin wrote:



On 21/03/2020, at 4:00 AM, Adrian Klaver > wrote:


On 3/20/20 2:17 AM, Matt Magoffin wrote:

Hello,
Indexes:
    "da_datum_pkey" UNIQUE, btree (node_id, ts, source_id) CLUSTER, 
tablespace "solarindex"
    "da_datum_reverse_pkey" UNIQUE, btree (node_id, ts DESC, 
source_id), tablespace "solarindex"
    "da_datum_x_acc_idx" UNIQUE, btree (node_id, source_id, ts DESC, 
jdata_a) WHERE jdata_a IS NOT NULL, tablespace "solarindex"

The error/query looks like:
ERROR: duplicate key value violates unique constraint 
“_hyper_1_1931_chunk_da_datum_x_acc_idx"


What is the above index UNIQUE over?

What is da_datum_x_acc_idx index below indexed over?


The _hyper_1_1931_chunk_da_datum_x_acc_idx index has the same definition 
as the da_datum_x_acc_idx above (it is defined on a child table). That 
is, they are both essentially:


UNIQUE, btree (node_id, source_id, ts DESC, jdata_a) WHERE jdata_a IS 
NOT NULL


The da_datum_pkey index is what the ON CONFLICT cause refers to, so 
(node_id, ts, source_id) is UNIQUE as well.


Hmm, wonder if you are getting bit by this?:

https://www.postgresql.org/docs/12/sql-insert.html#SQL-ON-CONFLICT

"INSERT with an ON CONFLICT DO UPDATE clause is a “deterministic” 
statement. This means that the command will not be allowed to affect any 
single existing row more than once; a cardinality violation error will 
be raised when this situation arises. Rows proposed for insertion should 
not duplicate each other in terms of attributes constrained by an 
arbiter index or constraint."





— m@




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: PG12 autovac issues

2020-03-20 Thread Andres Freund
Hi,

On 2020-03-20 12:42:31 -0500, Justin King wrote:
> When we get into this state again, is there some other information
> (other than what is in pg_stat_statement or pg_stat_activity) that
> would be useful for folks here to help understand what is going on?

If it's actually stuck on a single table, and that table is not large,
it would be useful to get a backtrace with gdb.

Greetings,

Andres Freund




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread pabloa98
On Fri, Mar 20, 2020 at 5:39 AM rob stone  wrote:

> Hello,
>
> On Thu, 2020-03-19 at 14:36 -0700, pabloa98 wrote:
> > Hello,
> >
> > My schema requires a counter for each combination of 2 values.
> > Something like:
> >
> > CREATE TABLE counter(
> > group INT NOT NULL,
> > element INT NOT NULL,
> > seq_number INT NOT NULL default 0,
> > CONSTRAINT PRIMARY KEY (group, element)
> > );
> >
> > For each entry in counter, aka for each (group, element) pair, the
> > model requires a seq_number.
> >
> > If I use a table "counter", I could still have counter collisions
> > between 2 transactions. I need truly sequence behavior. Is that
> > possible by using a table like "counter" table, where the counter
> > could be increased out of the transaction so it performs as a
> > sequence without having race conditions between concurrent
> > transactions?
> >
> > The other option is to create sequences for each new pair of (group,
> > element) using triggers. There are millions of pairs. So this
> > approach will generate millions of sequences.
> >
> > How a PostgreSQL database would behave having millions of sequences
> > in a schema? Would it degrade its performance? Is there any negative
> > impact?
> >
> > Regards
> >
> > Pablo
> >
> >
> >
>
>
> Have you tried the following:-
>
> BEGIN;
> SELECT (seq_number + 1) FROM counter WHERE group = $1
> AND element = $2 FOR UPDATE;
>
> If that gives a NOTFOUND exception, then you need to insert into table
> counter using a seq_number = 1, or update it with (seq_number + 1).
> Then do the insert into whatever table needs the seq_number value.
>
> If no errors detected, then COMMIT else ROLLBACK.
> It is all in one transaction block.
> We use this approach to assign invoice numbers.
>
>
We do that in other parts and with another set of tables. They have higher
concurrency than the problem I bring to the list.
The issue in doing this is that the client and PostgreSQL server talks too
much. Too many messages! Increasing latency.

We replaced it with a lock on the row so we can increase the counter and
commit the transaction. The problem with this approach is that we are
serializing access to this table and it has performance implications. It is
basically a global serialization of an update on this table crossing all
the servers on the cluster.




> You could pre-seed table counter when a group/element pair is created
> for the first time.
>
> HTH,
> Robert
>
>
>


Re: unexpected chunk number 2 (expected 0) for toast value ... in pg_toast_18536

2020-03-20 Thread Karsten Hilbert
On Mon, Mar 16, 2020 at 12:38:35PM -0700, Andres Freund wrote:

> > We have ruled out (?) below-PG hardware problems by a
> > successful run of:
> >
> > cp -rv —preserve=all /var/lib/postgresql/9.6  /tmp/
>
> FWIW, I don't think that rules out hardware problems at all. In plenty
> cases of corruption you can just end up with corrupted on-disk data
> (swapped blocks, zeroed blocks, randomly different values ...).

... hence the (?) ...

> But obviously it is not at all guaranteed that is the case. Could you
> describe the "history" of the database? Replication set up, failovers,
> etc?

No replication, no failovers.

There may have been hard shutdowns as in power failure but
there's no history of that to relate.

> A plpgsql function that returns the rows one-by-one and catches the
> exception is probably your best bet.

We have done that (in Python) for good measure during
recovery procedures.

> It could roughly look something like:
>
> CREATE OR REPLACE FUNCTION salvage(p_tblname regclass)
> RETURNS SETOF text
> LANGUAGE plpgsql AS
> $$
> DECLARE
> v_row record;
> BEGIN
> FOR v_row IN EXECUTE 'SELECT * FROM '||p_tblname::text LOOP
>BEGIN
>-- this forces detoasting
>RETURN NEXT v_row::text;
>EXCEPTION WHEN internal_error OR data_corrupted OR index_corrupted THEN
>-- add pkey or something else
>RAISE NOTICE 'failed to return data';
>END;
> END LOOP;
> END
> $$
>
> should work. You can call it like
> SELECT (salvaged_rec.rec).* FROM (SELECT salvaged_text::salvage_me FROM 
> salvage('salvage_me') AS salvaged_text) AS salvaged_rec(rec)

Thanks for taking the time. Would something like this be a
useful addition to the adminpack extension ?

Best,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: unexpected chunk number 2 (expected 0) for toast value ... in pg_toast_18536

2020-03-20 Thread Karsten Hilbert
On Sun, Mar 15, 2020 at 08:11:18PM -0400, Tom Lane wrote:

> Karsten Hilbert  writes:
> >>> According to
> >>> http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html
> >>> an UPDATE of the row is recommended -- should that work
> >>> better than a DELETE ?
>
> > OK, got that. What I now don't understand is how the UPDATE
> > won't have to touch the TOAST table when the TOASTed value
> > *is* UPDATEd:
> > update blobs.doc_obj set data = '' where pk = the_faulty_row;
> > (data is the BYTEA column)
>
> It makes no sense to me either; I wonder if Josh's recipe ever
> really worked?  But it's clearly not working now, and that's
> what I'd expect, because any mechanism for removing the busted
> toast reference is going to cause the system to try to mark
> the toast rows deleted.
>
> Since you reindexed the toast table and it still doesn't find
> the missing chunks,

The user has reported that gratuitious and repeated use of
REINDEX/VACUUM has eventually led to a consistent database.
That one row went missing but can be re-created.

Unfortunately, I neither have the original data for testing
(it is a medical record database and the client won't hand
out copies for obvious reasons) nor can I ascertain the exact
order of steps they eventually took.

For the record.

Thanks,
Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




Re: Duplicate key violation on upsert

2020-03-20 Thread Matt Magoffin


> On 21/03/2020, at 4:00 AM, Adrian Klaver  wrote:
> 
> On 3/20/20 2:17 AM, Matt Magoffin wrote:
>> Hello,
>> Indexes:
>> "da_datum_pkey" UNIQUE, btree (node_id, ts, source_id) CLUSTER, 
>> tablespace "solarindex"
>> "da_datum_reverse_pkey" UNIQUE, btree (node_id, ts DESC, source_id), 
>> tablespace "solarindex"
>> "da_datum_x_acc_idx" UNIQUE, btree (node_id, source_id, ts DESC, 
>> jdata_a) WHERE jdata_a IS NOT NULL, tablespace "solarindex"
>> The error/query looks like:
>> ERROR: duplicate key value violates unique constraint 
>> “_hyper_1_1931_chunk_da_datum_x_acc_idx"
> 
> What is the above index UNIQUE over?
> 
> What is da_datum_x_acc_idx index below indexed over?

The _hyper_1_1931_chunk_da_datum_x_acc_idx index has the same definition as the 
da_datum_x_acc_idx above (it is defined on a child table). That is, they are 
both essentially:

UNIQUE, btree (node_id, source_id, ts DESC, jdata_a) WHERE jdata_a IS NOT NULL

The da_datum_pkey index is what the ON CONFLICT cause refers to, so (node_id, 
ts, source_id) is UNIQUE as well.

— m@



Re: PG12 autovac issues

2020-03-20 Thread Justin King
On Thu, Mar 19, 2020 at 6:56 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-19 18:07:14 -0500, Justin King wrote:
> > On Thu, Mar 19, 2020 at 5:35 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2020-03-19 10:23:48 -0500, Justin King wrote:
> > > > > From a single stats snapshot we can't actually understand the actual 
> > > > > xid
> > > > > consumption - is it actually the xid usage that triggers the vacuums?
> > > >
> > > > We have looked at this and the xid consumption averages around 1250
> > > > xid/sec -- this is when we see the "aggressive" autovac kick off in
> > > > the logs.  What I don't understand is why these xid's are being
> > > > consumed at this rate on the databases with no activity (postgres,
> > > > template1).
> > >
> > > The xid counter is global across all databases.
> >
> > Then what does the "age" value represent for each database in this
> > case?  Perhaps I'm misunderstanding what I'm looking at?
> >
> > postgres=#  SELECT datname, age(datfrozenxid),
> > current_setting('autovacuum_freeze_max_age') FROM pg_database;
> >   datname  |age| current_setting
> > ---+---+-
> >  postgres  | 100937449 | 2
> >  template1 |  50244438 | 2
> >  template0 | 160207297 | 2
> >  feedi | 150147602 | 2
>
> Look at datfrozenxid without the age(). age(xid) computes how "old" xid
> is compared to the "next" xid to be assigned. Until vacuum comes around
> and performs work, pg_database.datfrozenxid / pg_class.relfrozenxid are
> constant, since they represent the values actually present in the
> table.  But if xids are being consumed, their "age" increases, because
> they're further and further in the past relative to the "newest" xids.
>
>
> > > One big difference between a manual VACUUM and autovacuum is that with
> > > the default settings VACUUM is not throttled, but autovacuum is.
> > >
> > > What are your vacuum_cost_delay, autovacuum_vacuum_cost_delay,
> > > vacuum_cost_limit, autovacuum_vacuum_cost_limit, vacuum_cost_page_hit,
> > > vacuum_cost_page_miss set to?
> >
> > Here are all the vacuum related values for the server:
> >
> > postgres=# select name,setting from pg_settings where name like '%vacuum%';
> > autovacuum = on
> > autovacuum_analyze_scale_factor = 0.1
> > autovacuum_analyze_threshold = 2500
> > autovacuum_freeze_max_age = 2
> > autovacuum_max_workers = 8
> > autovacuum_multixact_freeze_max_age = 4
> > autovacuum_naptime = 15
> > autovacuum_vacuum_cost_delay = 20
> > autovacuum_vacuum_cost_limit = -1
> > autovacuum_vacuum_scale_factor = 0.2
> > autovacuum_vacuum_threshold = 500
> > autovacuum_work_mem = -1
> > log_autovacuum_min_duration = 0
> > vacuum_cleanup_index_scale_factor = 0.1
> > vacuum_cost_delay = 0
> > vacuum_cost_limit = 1000
> > vacuum_cost_page_dirty = 20
> > vacuum_cost_page_hit = 1
> > vacuum_cost_page_miss = 10
> > vacuum_defer_cleanup_age = 0
> > vacuum_freeze_min_age = 5000
> > vacuum_freeze_table_age = 15000
> > vacuum_multixact_freeze_min_age = 500
> > vacuum_multixact_freeze_table_age = 15000
> >
> > I know the database is busy, so the throttling makes sense, but it
> > seems like it would complete eventually.
>
> The cost limit/delay are way too long/small respectively for a busy
> postgres instance.

This does make sense, we will look into adjusting those values.

> > We see blocked autovacs for many hours.
>
> On the same table, or just generally being busy?

We haven't isolated *which* table it is blocked on (assuming it is),
but all autovac's cease running until we manually intervene.

When we get into this state again, is there some other information
(other than what is in pg_stat_statement or pg_stat_activity) that
would be useful for folks here to help understand what is going on?
>
> Greetings,
>
> Andres Freund




Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Adrian Klaver

On 3/20/20 9:59 AM, Adrian Klaver wrote:

On 3/19/20 10:31 PM, pabloa98 wrote:

I see.

Any suggestion? It should behave like a sequence in the sense that 
concurrent transitions will get different numbers from this 
alternative sequence like solution.


In our case, we will need to do a call nextval('some_seq') (or 
similar) from different processes no more than twice every minute.



It would be nice to have a sequence data type. Correct me if I am 
wrong, but It seems to me that a sequence data type would cost the 
same or less than the current sequence implementation.


The schema would be more clear too. We could have a table like:

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
seq SEQUENCE INCREMENT 1
               START 1
               CACHE 1
               MINVALUE 1
               MAXVALUE 
 NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

And then:

INSERT INTO event(group, element, code)
VALUES (
  1,
  1,
  ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE 
p.group=1 and p.code=1 )

);

Or perhaps storing all the sequences in the same table as rows will 
have the same behavior.


If code is just something to show the sequence of insertion for group, 
element combinations then maybe something like below:


CREATE TABLE event(
group_id INT NOT NULL, --changed as group is reserved word
element INT NOT NULL,
insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
PRIMARY KEY(group_id, element, insert_ts)
);


insert into event(group_id, element) VALUES
(1, 1),
(1, 1),
(1, 1),
(2, 1),
(1, 1),
(1, 3),
(1, 1),
(1, 3),
(2, 1),
(2, 1);


select * from event ;
  group_id | element |   insert_ts
--+-+
     1 |   1 | 03/20/2020 09:51:12.675926 PDT
     1 |   1 | 03/20/2020 09:51:12.675985 PDT
     1 |   1 | 03/20/2020 09:51:12.675991 PDT
     2 |   1 | 03/20/2020 09:51:12.675996 PDT
     1 |   1 | 03/20/2020 09:51:12.676 PDT
     1 |   3 | 03/20/2020 09:51:12.676004 PDT
     1 |   1 | 03/20/2020 09:51:12.676008 PDT
     1 |   3 | 03/20/2020 09:51:12.676012 PDT
     2 |   1 | 03/20/2020 09:51:12.676016 PDT
     2 |   1 | 03/20/2020 09:51:12.67602 PDT
(10 rows)


select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element)) AS code from event;

  group_id | element | code
--+-+--
     1 |   1 |    1
     1 |   1 |    2
     1 |   1 |    3
     1 |   1 |    4
     1 |   1 |    5
     1 |   3 |    1
     1 |   3 |    2
     2 |   1 |    1
     2 |   1 |    2
     2 |   1 |    3
(10 rows)


Oops the above actually returned:

select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element)) AS code, insert_ts from event;

 group_id | element | code |   insert_ts
--+-+--+
1 |   1 |1 | 03/20/2020 09:51:12.675926 PDT
1 |   1 |2 | 03/20/2020 09:51:12.675985 PDT
1 |   1 |3 | 03/20/2020 09:51:12.675991 PDT
1 |   1 |4 | 03/20/2020 09:51:12.676008 PDT
1 |   1 |5 | 03/20/2020 09:51:12.676 PDT
1 |   3 |1 | 03/20/2020 09:51:12.676004 PDT
1 |   3 |2 | 03/20/2020 09:51:12.676012 PDT
2 |   1 |1 | 03/20/2020 09:51:12.67602 PDT
2 |   1 |2 | 03/20/2020 09:51:12.676016 PDT
2 |   1 |3 | 03/20/2020 09:51:12.675996 PDT
(10 rows)


Needs to be:

select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element, insert_ts)) AS code, insert_ts 
from event;

 group_id | element | code |   insert_ts
--+-+--+
1 |   1 |1 | 03/20/2020 09:51:12.675926 PDT
1 |   1 |2 | 03/20/2020 09:51:12.675985 PDT
1 |   1 |3 | 03/20/2020 09:51:12.675991 PDT
1 |   1 |4 | 03/20/2020 09:51:12.676 PDT
1 |   1 |5 | 03/20/2020 09:51:12.676008 PDT
1 |   3 |1 | 03/20/2020 09:51:12.676004 PDT
1 |   3 |2 | 03/20/2020 09:51:12.676012 PDT
2 |   1 |1 | 03/20/2020 09:51:12.675996 PDT
2 |   1 |2 | 03/20/2020 09:51:12.676016 PDT
2 |   1 |3 | 03/20/2020 09:51:12.67602 PDT
(10 rows)





Pablo


On Thu, Mar 19, 2020 at 7:56 PM Tom Lane > wrote:


    Michael Lewis mailto:mle...@entrata.com>> 
writes:

 > On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
    mailto:david.g.johns...@gmail.com>>
 > wrote:
 >> However, one other consideration with sequences: do you care that
 >> PostgreSQL will cache/pin (i.e., no release) every single
    sequence you
 >> touch for the 

Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread Adrian Klaver

On 3/19/20 10:31 PM, pabloa98 wrote:

I see.

Any suggestion? It should behave like a sequence in the sense that 
concurrent transitions will get different numbers from this alternative 
sequence like solution.


In our case, we will need to do a call nextval('some_seq') (or similar) 
from different processes no more than twice every minute.



It would be nice to have a sequence data type. Correct me if I am wrong, 
but It seems to me that a sequence data type would cost the same or less 
than the current sequence implementation.


The schema would be more clear too. We could have a table like:

CREATE TABLE pair(
group INT NOT NULL,
element INT NOT NULL,
seq SEQUENCE INCREMENT 1
               START 1
               CACHE 1
               MINVALUE 1
               MAXVALUE 
     NOT NULL,
CONSTRAINT PRIMARY KEY (group, element)
);

And then:

INSERT INTO event(group, element, code)
VALUES (
  1,
  1,
  ( SELECT seq.nextval('code_1_1_seq') FROM pair p WHERE 
p.group=1 and p.code=1 )

);

Or perhaps storing all the sequences in the same table as rows will have 
the same behavior.


If code is just something to show the sequence of insertion for group, 
element combinations then maybe something like below:


CREATE TABLE event(
group_id INT NOT NULL, --changed as group is reserved word
element INT NOT NULL,
insert_ts timestamptz NOT NULL DEFAULT clock_timestamp(),
PRIMARY KEY(group_id, element, insert_ts)
);


insert into event(group_id, element) VALUES
(1, 1),
(1, 1),
(1, 1),
(2, 1),
(1, 1),
(1, 3),
(1, 1),
(1, 3),
(2, 1),
(2, 1);


select * from event ;
 group_id | element |   insert_ts
--+-+
1 |   1 | 03/20/2020 09:51:12.675926 PDT
1 |   1 | 03/20/2020 09:51:12.675985 PDT
1 |   1 | 03/20/2020 09:51:12.675991 PDT
2 |   1 | 03/20/2020 09:51:12.675996 PDT
1 |   1 | 03/20/2020 09:51:12.676 PDT
1 |   3 | 03/20/2020 09:51:12.676004 PDT
1 |   1 | 03/20/2020 09:51:12.676008 PDT
1 |   3 | 03/20/2020 09:51:12.676012 PDT
2 |   1 | 03/20/2020 09:51:12.676016 PDT
2 |   1 | 03/20/2020 09:51:12.67602 PDT
(10 rows)


select group_id, element, row_number() OVER (partition by (group_id, 
element) order by (group_id, element)) AS code from event;

 group_id | element | code
--+-+--
1 |   1 |1
1 |   1 |2
1 |   1 |3
1 |   1 |4
1 |   1 |5
1 |   3 |1
1 |   3 |2
2 |   1 |1
2 |   1 |2
2 |   1 |3
(10 rows)



Pablo


On Thu, Mar 19, 2020 at 7:56 PM Tom Lane > wrote:


Michael Lewis mailto:mle...@entrata.com>> writes:
 > On Thu, Mar 19, 2020, 5:48 PM David G. Johnston
mailto:david.g.johns...@gmail.com>>
 > wrote:
 >> However, one other consideration with sequences: do you care that
 >> PostgreSQL will cache/pin (i.e., no release) every single
sequence you
 >> touch for the lifetime of the session? (I do not think DISCARD
matters here
 >> but I'm just guessing)

 > Would you expand on this point or is there someplace specific in the
 > documentation on this?

I think what David is worried about is that a sequence object is a
one-row table in PG's implementation.  Thus

(1) each sequence requires a dozen or two rows in assorted system
catalogs (not sure exactly how many offhand).

(2) each sequence uses up 8KB on disk for its table file.

(3) each sequence you actually access within a particular session
results in creation of relcache and catcache entries in that
session's local memory.  I'm not sure offhand how big those are
either, but a few KB per sequence would be a reasonable guess.

(4) each sequence competes for space in the shared-buffer arena,
since its 8K block has to be swapped into there whenever you try
to access/increment the sequence's value.

This is all mighty inefficient of course, and there's been talk
of trying to reduce the per-sequence overhead; but I don't know
of anyone actively working on that.  As things stand, I think
having millions of sequences would be quite painful performance-
wise, especially if your workload were actively hitting a lot
of them concurrently.  It would work, for some value of "work",
but it wouldn't perform very well.

Also, as multiple people mentioned already, this still wouldn't
guarantee gap-free sequences of ID values.

                         regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Passwordcheck configuration

2020-03-20 Thread Ron

On 3/19/20 6:19 PM, Tom Lane wrote:

Dave Hughes  writes:

I have a requirement to set some password complexity for our database such
as length of password, upper case, lower case, special characters,
expiration limit, reuse, etc.

Usually, if you have to do something like that, we recommend setting PG to
use PAM authentication and configuring the restrictions on the PAM side.
The only native capability in that direction is that you can set a
password expiration date.

Note that it's widely believed that this sort of thing makes you LESS
secure, not more.


Correct.

https://pages.nist.gov/800-63-3/sp800-63b.html#memsecret

NIST Special Publication 800-63B
Digital Identity Guidelines

5.1.1.2 Memorized Secret Verifiers

"Verifiers SHALL require subscriber-chosen memorized secrets to be at least 
8 characters in length."
"Verifiers SHOULD NOT impose other composition rules (e.g., requiring 
mixtures of different character types or prohibiting consecutively repeated 
characters) for memorized secrets."
"Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily 
(e.g., periodically)."




   Quite aside from the well-established fact that forced
password changes are bad from a human-factors standpoint, you can't check
any of those other points unless the password is sent to the server as
cleartext.  That creates its own set of vulnerabilities, and I don't
know of anybody who considers it good practice.


I saw there was a module you can use for this called passwordcheck.  Seems
easy to install, but I don't see how you can configure it for you specific
needs?

passwordcheck hasn't got any out-of-the-box configurability.  It's mainly
meant as sample code that people could modify if they have a mind to.

(I seem to recall some recent discussion about deprecating/removing
passwordcheck altogether, but I can't find it right now.)

regards, tom lane




--
Angular momentum makes the world go 'round.




Re: Passwordcheck configuration

2020-03-20 Thread Dave Hughes
Thank you for the information!  This issue originated from a Department of
Defense STIG (Security Technical Implementation Guides).  It's a security
check that applications and databases have to go through.  I'll just leave
this one as a "finding" since there isn't a way to really configure it to
their requirements.

Thanks again for your help.

On Thu, Mar 19, 2020 at 7:19 PM Tom Lane  wrote:

> Dave Hughes  writes:
> > I have a requirement to set some password complexity for our database
> such
> > as length of password, upper case, lower case, special characters,
> > expiration limit, reuse, etc.
>
> Usually, if you have to do something like that, we recommend setting PG to
> use PAM authentication and configuring the restrictions on the PAM side.
> The only native capability in that direction is that you can set a
> password expiration date.
>
> Note that it's widely believed that this sort of thing makes you LESS
> secure, not more.  Quite aside from the well-established fact that forced
> password changes are bad from a human-factors standpoint, you can't check
> any of those other points unless the password is sent to the server as
> cleartext.  That creates its own set of vulnerabilities, and I don't
> know of anybody who considers it good practice.
>
> > I saw there was a module you can use for this called passwordcheck.
> Seems
> > easy to install, but I don't see how you can configure it for you
> specific
> > needs?
>
> passwordcheck hasn't got any out-of-the-box configurability.  It's mainly
> meant as sample code that people could modify if they have a mind to.
>
> (I seem to recall some recent discussion about deprecating/removing
> passwordcheck altogether, but I can't find it right now.)
>
> regards, tom lane
>


Re: How to get RAISE INFO in JDBC

2020-03-20 Thread Thomas Kellerer

Ravi Krishna schrieb am 20.03.2020 um 15:48:

How do we get the notification mentioned in RAISE INFO of PL/PGSQL in JDBC.
I can get it in psql, but not in JDBC.

I am talking about redshift which has the same procedural language like 
postgres.


I can only answer this for Postgres.
As Redhisft has its own JDBC driver, I don't know if this applies to that as 
well:

All messages from a RAISE are reported as SQL Warnings on the Statement 
instance that you used.

So you can use Statement.getWarnings() to retrieve the RAISE output.
Note, that you need a loop that checks SQLWarning.getNextWarning() to find out 
if there was more than one RAISE

Thomas






Re: Duplicate key violation on upsert

2020-03-20 Thread Adrian Klaver

On 3/20/20 2:17 AM, Matt Magoffin wrote:

Hello,

I am experiencing a duplicate key violation in Postgres 9.6 on occasion 
for one particular query, and I’m wondering where I’m going wrong. My 
table looks like this:


                       Table "solardatum.da_datum"
   Column   |           Type           | Collation | Nullable | Default
---+--+---+--+-
  ts        | timestamp with time zone |           | not null |
  node_id   | bigint                   |           | not null |
  source_id | character varying(64)    |           | not null |
  posted    | timestamp with time zone |           | not null |
  jdata_i   | jsonb                    |           |          |
  jdata_a   | jsonb                    |           |          |
  jdata_s   | jsonb                    |           |          |
  jdata_t   | text[]                   |           |          |
Indexes:
     "da_datum_pkey" UNIQUE, btree (node_id, ts, source_id) CLUSTER, 
tablespace "solarindex"
     "da_datum_reverse_pkey" UNIQUE, btree (node_id, ts DESC, 
source_id), tablespace "solarindex"
     "da_datum_x_acc_idx" UNIQUE, btree (node_id, source_id, ts DESC, 
jdata_a) WHERE jdata_a IS NOT NULL, tablespace "solarindex"

Triggers:
     aa_agg_stale_datum BEFORE INSERT OR DELETE OR UPDATE ON 
solardatum.da_datum FOR EACH ROW EXECUTE 
PROCEDURE solardatum.trigger_agg_stale_datum()
     ts_insert_blocker BEFORE INSERT ON solardatum.da_datum FOR EACH ROW 
EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()


The error/query looks like:

ERROR: duplicate key value violates unique constraint 
“_hyper_1_1931_chunk_da_datum_x_acc_idx"


What is the above index UNIQUE over?

What is da_datum_x_acc_idx index below indexed over?


   Where: SQL statement "INSERT INTO solardatum.da_datum(ts, node_id, 
source_id, posted, jdata_i, jdata_a, jdata_s, jdata_t)

VALUES (…)
ON CONFLICT (node_id, ts, source_id) DO UPDATE
SET jdata_i = EXCLUDED.jdata_i,
jdata_a = EXCLUDED.jdata_a,
jdata_s = EXCLUDED.jdata_s,
jdata_t = EXCLUDED.jdata_t,
posted = EXCLUDED.posted
RETURNING (xmax = 0)"

I am using the TimescaleDB extension so there are child tables 
inheriting from this main table and that’s why the reported index name 
differs from the definition shown above. I’m not sure if the extension 
is the problem, so I thought I’d start here to see if I’ve configured 
something wrong or my expectations on how the upsert should work is 
wrong. My expectation was that basically the insert would never fail 
from a duplicate key violation.


The error always references the da_datum_x_acc_idx index, which is a 
partial index with jdata_a added as a covering column… that is, it’s 
only in the index so I can get some index-only results with that column. 
Is the partial index possibly an issue in this configuration?


Thanks for any insight,
Matt



--
Adrian Klaver
adrian.kla...@aklaver.com




How to get RAISE INFO in JDBC

2020-03-20 Thread Ravi Krishna
How do we get the notification mentioned in RAISE INFO of PL/PGSQL in JDBC. 
I can get it in psql, but not in JDBC.

I am talking about redshift which has the same procedural language like 
postgres.

thanks





Re: Partition by hash formula

2020-03-20 Thread Laurenz Albe
On Fri, 2020-03-20 at 12:50 +0300, Vlad Bokov wrote:
> recently I read
> https://blog.dbi-services.com/hash-partitioning-in-postgresql-11/
> and reproduced steps from the, which gave exact same count outcome,
> which made me think
> hashing is a pure function.
> 
> Examined the data I saw following distribution:
> dept_1: 1,14,39...; dept_2: 3,11,21...; dept_3: 16,18,25...
> 
> I also found commit 1aba8e651a introducing the feature, but I'm not too
> much into internals.
> I wonder, could somebody write that hashing formula in pseudo-code or
> point to specific place in the code, please?

Look for the int4hashfast, texthashfast and similar functions in
src/backend/utils/cache/catcache.c

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Wal receiver process listens to physical IP

2020-03-20 Thread Laurenz Albe
On Fri, 2020-03-20 at 11:27 +, Mariya Rampurawala wrote:
> I have been working on a project where I need to virtualize the postgresql 
> standby server.
> The wal receiver process listens to the Physical IP and a random port.
> How can I configure it to listen to a VIP that the wal sender process can 
> connect to?
> Also, Is there a way I can configure the port number for this?
> 
> $ /usr/pgsql-12/bin/postgres -V
> postgres (PostgreSQL) 12.2
> 
> postgresql.conf:
> listen_addresses = '10.209.57.17'   This is the VIP
> 
> $ ps -ef | grep post
> postgres  2945 1  0 05:37 ?00:00:00 /usr/pgsql-12/bin/postgres -D 
> /pg_mnt/pg-12/data -h 10.209.57.17
> postgres  2946  2945  0 05:37 ?00:00:00 postgres: logger   
> postgres  2947  2945  0 05:37 ?00:00:00 postgres: startup   
> recovering 00010007
> postgres  2952  2945  0 05:37 ?00:00:00 postgres: checkpointer   
> postgres  2953  2945  0 05:37 ?00:00:00 postgres: background writer   
> postgres  2954  2945  0 05:37 ?00:00:00 postgres: stats collector   
> postgres  2955  2945  0 05:37 ?00:00:04 postgres: walreceiver   
> 
> $ netstat -anp | grep 55656
> tcp0  0 10.209.57.15:55656  10.209.57.16:5432   
> ESTABLISHED 2955/postgres: walr 
> 
> 10.209.57.15 is the physical IP
> 
> $ ps -ef | grep 2955
> postgres  2955  2945  0 05:37 ?00:00:04 postgres: walreceiver 

The standby is *listening* on 10.209.57.17 for connections, but
the WAL receiver is not listening, it actively connects to the primary
server.  Obviously it is using the other network interface.

To change that, you would have to change the network route to the primary
server to lead through the virtual interface.
But I don't see any reason to do that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-20 Thread rob stone
Hello,

On Thu, 2020-03-19 at 14:36 -0700, pabloa98 wrote:
> Hello,
> 
> My schema requires a counter for each combination of 2 values.
> Something like:
> 
> CREATE TABLE counter(
> group INT NOT NULL,
> element INT NOT NULL,
> seq_number INT NOT NULL default 0,
> CONSTRAINT PRIMARY KEY (group, element)
> );
> 
> For each entry in counter, aka for each (group, element) pair, the
> model requires a seq_number.
> 
> If I use a table "counter", I could still have counter collisions
> between 2 transactions. I need truly sequence behavior. Is that
> possible by using a table like "counter" table, where the counter
> could be increased out of the transaction so it performs as a
> sequence without having race conditions between concurrent
> transactions?
> 
> The other option is to create sequences for each new pair of (group,
> element) using triggers. There are millions of pairs. So this
> approach will generate millions of sequences.
> 
> How a PostgreSQL database would behave having millions of sequences
> in a schema? Would it degrade its performance? Is there any negative
> impact?
> 
> Regards
> 
> Pablo
> 
> 
> 


Have you tried the following:-

BEGIN;
SELECT (seq_number + 1) FROM counter WHERE group = $1 
AND element = $2 FOR UPDATE; 

If that gives a NOTFOUND exception, then you need to insert into table
counter using a seq_number = 1, or update it with (seq_number + 1).
Then do the insert into whatever table needs the seq_number value.

If no errors detected, then COMMIT else ROLLBACK.
It is all in one transaction block.
We use this approach to assign invoice numbers.

You could pre-seed table counter when a group/element pair is created
for the first time.

HTH,
Robert






Wal receiver process listens to physical IP

2020-03-20 Thread Mariya Rampurawala
Hi,

I have been working on a project where I need to virtualize the postgresql 
standby server.
The wal receiver process listens to the Physical IP and a random port.
How can I configure it to listen to a VIP that the wal sender process can 
connect to?
Also, Is there a way I can configure the port number for this?


$ /usr/pgsql-12/bin/postgres -V

postgres (PostgreSQL) 12.2


postgresql.conf:

listen_addresses = '10.209.57.17'  • This is the VIP



$ ps -ef | grep post
postgres  2945 1  0 05:37 ?00:00:00 /usr/pgsql-12/bin/postgres -D 
/pg_mnt/pg-12/data -h 10.209.57.17
postgres  2946  2945  0 05:37 ?00:00:00 postgres: logger
postgres  2947  2945  0 05:37 ?00:00:00 postgres: startup   recovering 
00010007
postgres  2952  2945  0 05:37 ?00:00:00 postgres: checkpointer
postgres  2953  2945  0 05:37 ?00:00:00 postgres: background writer
postgres  2954  2945  0 05:37 ?00:00:00 postgres: stats collector
postgres  2955  2945  0 05:37 ?00:00:04 postgres: walreceiver





$ netstat -anp | grep 55656
tcp0  0 10.209.57.15:55656  10.209.57.16:5432   ESTABLISHED 
2955/postgres: walr





10.209.57.15 is the physical IP


$ ps -ef | grep 2955
postgres  2955  2945  0 05:37 ?00:00:04 postgres: walreceiver


Regards,
Mariya



promise postgres that I will only ROLLBACK

2020-03-20 Thread Gergely Riskó
Hey,

I have a unit testing framework, where every test is ran in
BEGIN..ROLLBACK, to ensure that data created during testing is never
persisted.

Now, my question is: is there any way to tell postgres that

"Look, I will never ask you for a COMMIT, therefore you don't have to lock
anything for me,
you just have to do a 'snapshot', fake my writes in my view of the world,
then destroy all this on rollback".

This way, I would be able to run my many test cases in parallel, without
worrying about locks/deadlocks between different testcases.

Something like BEGIN READ ONLY TRANSACTION WITH TEMPORARY WRITES...

I of course researched this a little bit, and I know that there are no out
of the box syntax for this, but is there maybe some combinations of options
which I can abuse to provide this?

(I can naturally do this on the filesystem with snapshots and LVM, but I
would like something in postgres, so it doesn't become a sysadmin
nightmare.)

Thank you for your ideas,
Gergely Risko


Partition by hash formula

2020-03-20 Thread Vlad Bokov
Hello,

recently I read
https://blog.dbi-services.com/hash-partitioning-in-postgresql-11/
and reproduced steps from the, which gave exact same count outcome,
which made me think
hashing is a pure function.

Examined the data I saw following distribution:
dept_1: 1,14,39...; dept_2: 3,11,21...; dept_3: 16,18,25...

I also found commit 1aba8e651a introducing the feature, but I'm not too
much into internals.
I wonder, could somebody write that hashing formula in pseudo-code or
point to specific place in the code, please?

Thanks,
Vlad





Duplicate key violation on upsert

2020-03-20 Thread Matt Magoffin
Hello,

I am experiencing a duplicate key violation in Postgres 9.6 on occasion for one 
particular query, and I’m wondering where I’m going wrong. My table looks like 
this:

  Table "solardatum.da_datum"
  Column   |   Type   | Collation | Nullable | Default 
---+--+---+--+-
 ts| timestamp with time zone |   | not null | 
 node_id   | bigint   |   | not null | 
 source_id | character varying(64)|   | not null | 
 posted| timestamp with time zone |   | not null | 
 jdata_i   | jsonb|   |  | 
 jdata_a   | jsonb|   |  | 
 jdata_s   | jsonb|   |  | 
 jdata_t   | text[]   |   |  | 
Indexes:
"da_datum_pkey" UNIQUE, btree (node_id, ts, source_id) CLUSTER, tablespace 
"solarindex"
"da_datum_reverse_pkey" UNIQUE, btree (node_id, ts DESC, source_id), 
tablespace "solarindex"
"da_datum_x_acc_idx" UNIQUE, btree (node_id, source_id, ts DESC, jdata_a) 
WHERE jdata_a IS NOT NULL, tablespace "solarindex"
Triggers:
aa_agg_stale_datum BEFORE INSERT OR DELETE OR UPDATE ON solardatum.da_datum 
FOR EACH ROW EXECUTE PROCEDURE solardatum.trigger_agg_stale_datum()
ts_insert_blocker BEFORE INSERT ON solardatum.da_datum FOR EACH ROW EXECUTE 
PROCEDURE _timescaledb_internal.insert_blocker()

The error/query looks like:

ERROR: duplicate key value violates unique constraint 
“_hyper_1_1931_chunk_da_datum_x_acc_idx"
  Where: SQL statement "INSERT INTO solardatum.da_datum(ts, node_id, source_id, 
posted, jdata_i, jdata_a, jdata_s, jdata_t)
VALUES (…)
ON CONFLICT (node_id, ts, source_id) DO UPDATE
SET jdata_i = EXCLUDED.jdata_i,
jdata_a = EXCLUDED.jdata_a,
jdata_s = EXCLUDED.jdata_s,
jdata_t = EXCLUDED.jdata_t,
posted = EXCLUDED.posted
RETURNING (xmax = 0)"

I am using the TimescaleDB extension so there are child tables inheriting from 
this main table and that’s why the reported index name differs from the 
definition shown above. I’m not sure if the extension is the problem, so I 
thought I’d start here to see if I’ve configured something wrong or my 
expectations on how the upsert should work is wrong. My expectation was that 
basically the insert would never fail from a duplicate key violation.

The error always references the da_datum_x_acc_idx index, which is a partial 
index with jdata_a added as a covering column… that is, it’s only in the index 
so I can get some index-only results with that column. Is the partial index 
possibly an issue in this configuration?

Thanks for any insight,
Matt