Re: ERROR: posting list tuple with 2 items cannot be split at offset 17

2023-02-09 Thread Paul McGarry
Hi Peter,

Thanks for your input, I'm feeling more comfortable that I correctly
understand the problem now and it's "just" a collation related issue.


> I recommend running amcheck on all indexes, or at least all
> possibly-affected text indexes.
>
>
Will the amcheck reliably identify all issues that may arise from a
collation change?

I don't enough about the details of B-tree index to know whether the
problems are only "obvious" if they happen to interact with some boundary,
or whether it will always be evident with the basic amcheck, ie

bt_index_check(index regclass, false)

and therefore I can just use that to identify bad indexes and recreate
them, or should I recreate all btree indexes involving text fields?

We planned to do a dump/restore upgrade to PG14 in a month or so (already
done in dev).
I am wondering whether it will be less work overall to bring that forward
than rebuild these indexes...

Thanks again for your advice.

Paul


ERROR: posting list tuple with 2 items cannot be split at offset 17

2023-02-08 Thread Paul McGarry
I have three databases, two of databases where I am experiencing the issue
below.

The first database was created from a dump in Feb 2022 (a few weeks after
the time period for which I seem to have problematic indexes, maybe).
The second database was then cloned from the first (ie filesystem level
copy) soon after.
Since then all databases have undergone a number of minor version upgrades,
including to 13.9 and an OS update last week for the two problem databases
(the other is still on 13.8).

Now, a process which does clears some data > 13 months old is getting an
error when trying to do that update.

My suspicion is that either:
- there was probably an issue with the index 12 months ago and that problem
was copied when I cloned the database, and is just becoming apparent now a
script is accessing 13 month olf data.
- something in our recent upgrade has caused the problem.

The third database is still on 13.8, and with some OS updates pending, and
is not experiencing the problem.

The problem emerges as:


UPDATE widget SET description=NULL WHERE (time>='2022-01-07
17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08
17:40:05.780573+00' AND description IS NOT NULL);
ERROR:  posting list tuple with 2 items cannot be split at offset 17


A select on the same data works fine, so presumably a problem updating the
index, not accessing it or the corresponding table):


db=> select count(*) from widget where (time>='2022-01-07
17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08
17:40:05.780573+00' AND description IS NOT NULL);
 count

 797943



The index used as per explain:

 explain  UPDATE widget SET description=NULL WHERE (time>='2022-01-07
17:40:05.140287+00'::timestamp - INTERVAL '4 days' AND time<'2022-01-08
17:40:05.780573+00' AND description IS NOT NULL);

  QUERY PLAN
--
 Update on widget  (cost=0.57..2921626.80 rows=205910 width=1066)
   ->  Index Scan using widget_time_client_idx on widget
 (cost=0.57..2921626.80 rows=205910 width=1066)
 Index Cond: (("time" >= '2022-01-03 17:40:05.140287'::timestamp
without time zone) AND ("time" < '2022-01-08 17:40:05.780573+00'::timestamp
with time zone))
 Filter: (description IS NOT NULL)
(4 rows)


amcheck attempted on that index, but doesn't seem to identify any issues:


db=> SELECT bt_index_check('widget_time_client_idx',true);
DEBUG:  verifying consistency of tree structure for index
"widget_time_client_idx"
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)
DEBUG:  verifying that tuples from index "widget_time_client_idx" are
present in "widget"
DEBUG:  finished verifying presence of 639872196 tuples from table "widget"
with bitset 25.94% set
 bt_index_check


(1 row)

db=> SELECT bt_index_parent_check('widget_time_client_idx',true,true);
DEBUG:  verifying consistency of tree structure for index
"widget_time_client_idx" with cross-level checks
DEBUG:  verifying level 3 (true root level)
DEBUG:  verifying level 2
DEBUG:  verifying level 1
DEBUG:  verifying level 0 (leaf level)

DEBUG:  verifying that tuples from index "widget_time_client_idx" are
present in "widget"
DEBUG:  finished verifying presence of 639874864 tuples from table "widget"
with bitset 25.94% set
 bt_index_parent_check
---

(1 row)



We recreated that index and deleted the old index, and the update then
worked.
I've done that on one of the databases so far.

Despite that working, it then occurred to me that the problem might be due
to problems updating a different index on the same table.

I then found on the two problem DBs (but not the 13.8 one):


db=> select bt_index_check('widget_name_idx');
ERROR:  item order invariant violated for index "widget_name_idx"
DETAIL:  Lower index tid=(682201,85) (points to index tid=(682278,4097))
higher index tid=(682201,86) (points to index tid=(716079,1)) page
lsn=580/E554A858.



db=> select bt_index_check('widget_name_idx');
ERROR:  item order invariant violated for index "widget_name_idx"
DETAIL:  Lower index tid=(682201,49) (points to index tid=(682245,1))
higher index tid=(682201,50) (points to index tid=(734398,1)) page
lsn=566/E67C5FF0.


which as a text field, seems more likely to be the result of a collation
change problem that might accompany an OS update.

But if it is the problem, why did the update start working after I
recreated the other index?

I think I should now:
- recreate the widget_name_idx on the problem servers
- run bt_index_check across all other indexes

Any suggestions on what else I should look into, in particular anything I
should check before upgrading the remaining 13.8 DB to 13.9?

Thanks for any help,

Paul


Vacuuming by non owner/super users?

2021-03-07 Thread Paul McGarry
Is it possible for a non-owner or non super user to be given permission to
vacuum tables in the DB?

My initial thought is no, but the documentation says:

"To vacuum a table, one must ordinarily be the table's owner or a
superuser."

Where the "ordinarily" seems to imply there might be some non-ordinary way
of performing a vacuum?
(If it really is just table owner or super user, then the sentence would be
clearer without the "ordinarily").

I am trying to split up DB maintenance, ie would like to give permission to
vacuum DB/tables without giving them full rights to do other stuff on the
tables.

Thanks for any advice.

Paul


Re: Efficiently advancing a sequence without risking it going backwards.

2020-07-16 Thread Paul McGarry
On Fri, 10 Jul 2020 at 10:27, Jeremy Schneider 
wrote:

>
> OP asked for a way to call setval() with a guarantee the sequence will
> never go backwards IIUC. His code can check that the new value he wants to
> set is higher than the current value, but there’s a race condition where a
> second connection could quickly advance the sequence between the check and
> the setval() call and then cause duplicates from the next call which is bad.
>
> The ideal solution is a setval_forward_only() or setval_no_duplicates()
> function that does it atomically or something. If it were possible to
> “lock” the entire sequence to prevent any other sessions from using it at
> all, that would work too. Not locking a value, locking the whole thing.
> Very bad hack solution is renaming the sequence then renaming it back as a
> blunt form of locking... and to be clear I don’t think is a good idea I
> just was saying that technically it might work.  :)
>
> -Jeremy
>

Yes, that first paragraph is a good summary. A "setval_forward_only()" is
the sort of thing I was after.

Maybe something analogous to:
UPDATE the_seq SET last_value = number WHERE last_value < number;
with some sort of global (but short) lock as required.

Relating to some of the other replies there isn't a "requirement" (from an
application perspective) that the sequences always generate ids in
ascending order or that they don't skip numbers etc. To the application
they are just ids, as long as they are unique that is enough. However it is
an application that is used by people, so there is some external value in
having the ids going up in a way that roughly correlates to time as people
tend to expect numbers to do that sort of thing.

For a bit more background, we have our own application and homegrown
loosely coupled multi-primary DB cluster and replication system.
Each backend DB in the cluster has its own node id (0-9) and when our app
asks for a sequence value it calls a custom function which gets a normal
sequence value suffixed with the DB node ID.

So if there were two backend dbs (1 and 2) and both backend dbs had a
sequence with last_value of 1234 then our application would get a
"sequence" value of 12351 or 12352 depending on which db backend served the
request.
The resulting ids are unique across our cluster, but certainly not gapless
nor issued in strict ascending order which is fine from an application
perspective.

But as mentioned, from a human perspective there is some value in keeping
the ids issued by the cluster roughly in time order, so we have a secondary
process which liaises with all the backend nodes and pulls forwards any
sequences that fall behind other nodes. So if DB 1 happened to serve 1000
requests using the sequence while DB2 served none, the process pulls the
sequence in DB2 forward until it catches up, currently by calling nextval
in a loop.

Which all works fine. However sometimes (eg taking a node offline for
maintenance or upgrade) a sequence might get quite a long way out, and
calling nextval() 100k times seems a rather inefficient way to catch up
(but it is better to be inefficient than risk going backwards and causing a
duplicate id).

We have been using essentially this system for our cluster since Postgres 7
days, periodically we have touched base with Postgres replication
advancements (which have come a long way) but haven't yet found a
compelling reason to switch from what is working.

Paul


Efficiently advancing a sequence without risking it going backwards.

2020-07-06 Thread Paul McGarry
I have two sequences in different dbs which I want to keep roughly in sync
(they don't have to be exactly in sync, I am just keeping them in the same
ballpark).

Currently I have a process which periodically checks the sequences and does:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) while (nextval('DB2sequence')<=1234);

which works fine, but is pretty inefficient if the discrepancy is large (ie
calling nextval a hundred thousand times).

I don't think I can use setval(), because it risks making sequences go
backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on
another process,  (2) would take the sequence back from 1235 to 1234 and I
would end up trying to create a duplicate key ID from the sequence.

So what I really want is something equivalent to the setval, but with
"where DB2sequence <1234" logic so it doesn't overwrite the value if it is
already large.

Is there such a mechanism?

Thanks for any help.

Paul


Re: How to get timezone offset in timestamp with time zone AT TIME ZONE output.

2019-09-24 Thread Paul McGarry
On Wed, Sep 25, 2019 at 5:44 AM Adrian Klaver 
wrote:

>
> The issue is unclear so I am not sure you can discount this as a
> solution. The OP had:
>
> CREATE TABLE users (
> user_id biginit,
> user_timezone text, -- Eg 'Australia/Sydney','Asia/Hong_Kong'
> );
> CREATE TABLE data (
> id bigint,
> user_id bigint,
> datetime timestamp with time zone,
> );
> INSERT INTO users (1,'Australia/Sydney');
> INSERT INTO users (2,'Asia/Hong_Kong');
> INSERT INTO data (5,1,'2020-04-05 02:00:00');
> INSERT INTO data (6,2,'2020-04-05 02:00:00');
>
> and:
>
> "Therefore whatever renders the offset needs to be capable of doing it
> per row, independently of the server/session time zone."
>
> There is no indication of what the server timezone is set to or where
> the timestamps being assigned to date.datetime are coming from. Do they
> originate as local time(per user) or are they being  generated server side?
>

Sorry if I left something ambiguous.

I should have written:

INSERT INTO data (5,1,'2020-04-05 02:00:00+00');
INSERT INTO data (6,2,'2020-04-05 02:00:00+00');

for clarity rather than leaving the offset ambiguous.

In terms of the general problem I don't think specific server timezone
should really matter (though it is UTC in my case, it could feasibly be
anything), what I am trying to do is output the stored time in both:
1) server time zone
2) user time zone
including displaying the relevant UTC offset in both cases.
(1) just comes for free, but it's getting the datetime and offset for (2)
that is the question.

eg:
=
SELECT
  id,
  datetime,
  datetime AT TIME ZONE (SELECT user_timezone FROM users WHERE
data.user_id=users.user_id) AS usertime
FROM data;
=

and getting data something like:

id: 5
datetime: 2020-04-05 02:00:00+00
usertime: 2020-04-05 13:00:00+11

id: 6
datetime: 2020-04-05 02:00:00+00
usertime: 2020-04-05 10:00:00+08

(note above was done in my head as an example, I didn't check the real tz
offsets at that point in time for those zones).

Andrew's function seems plausible and I need to find some time to test it,
but I am slightly surprised there isn't a native way to get the output, as
it seems like something that would be fairly useful when dealing with dates.

Perhaps another avenue would be some sort of getOffset function, eg

getOffset('2019-09-25 02:00:00+00','Australia/Sydney')
that would return +11 (or just 11).

Presumably PostgreSQL must have some internal functionality like that
because it can do that math on the datetimes, but it doesn't seem to be
exposed to users.

Thanks all for your input.

Paul


How to get timezone offset in timestamp with time zone AT TIME ZONE output.

2019-09-23 Thread Paul McGarry
Hi there,

Does anyone have a good way of doing:

=
select '2020-04-04 15:00:00+00'::timestamp with time zone AT TIME ZONE
'Australia/Sydney';
  timezone
-
 2020-04-05 02:00:00

select '2020-04-04 16:00:00+00'::timestamp with time zone AT TIME ZONE
'Australia/Sydney';
  timezone
-
 2020-04-05 02:00:00
=

but with the output including the offset, eg:
2020-04-05 02:00:00+11
2020-04-05 02:00:00+10
respectively, so it is clear which 2am it is (the times above are around a
DST switch)?


I have seen a couple of suggestions involving setting the desired time zone
in the db session environment, but my actual use case will be a bit more
complex, something like,


CREATE TABLE users (
user_id biginit,
user_timezone text, -- Eg 'Australia/Sydney','Asia/Hong_Kong'
);
CREATE TABLE data (
id bigint,
user_id bigint,
datetime timestamp with time zone,
);
INSERT INTO users (1,'Australia/Sydney');
INSERT INTO users (2,'Asia/Hong_Kong');
INSERT INTO data (5,1,'2020-04-05 02:00:00');
INSERT INTO data (6,2,'2020-04-05 02:00:00');

and I'll want to run a query like:

select id, datetime,
  datetime AT TIME ZONE (select user_timezone from users where
data.user_id=users.user_id) as usertime from data;


where I want the usertime to be returned in the corresponding users
timezone, but with the offset. Therefore whatever renders the offset needs
to be capable of doing it per row, independently of the server/session time
zone.

And to_char isn't much help:


select to_char('2020-04-04 15:00:00+00'::timestamp with time zone AT TIME
ZONE 'Australia/Sydney','-MM-DD HH24:MI:SSOF');
to_char

 2020-04-05 02:00:00+00

 because to_char only deals with a timestamp and loses the timezone info
and you end up with something very wrong.

Any ideas?

Thanks for any help.

Paul


Re: Why the index is not used ?

2018-10-08 Thread Paul McGarry
Hi Didier,

Yes, credit cards are a very specific space that probably gets people who are 
familiar with it going a bit. By the time you factor in general security 
practices, specific PCI requirements, your threat model and likely business 
requirements (needing relatively free access to parts of the card number) the 
acceptable solution space narrows considerably.

More generally though I’d recommend reading:

https://paragonie.com/blog/2017/05/building-searchable-encrypted-databases-with-php-and-sql

as (even if you aren’t using PHP) it discusses several strategies and what 
makes them good/bad for different use cases and how to implement them well.

I don’t think I’d consider the main solution discussed there particularly 
applicable to credit card data (mostly because the low entropy of card data 
makes it difficult to handle safely without additional per-row randomness 
added, though as always, consult your QSA) but it is generally interesting.

Paul

Sent from my iPhone

> On 9 Oct 2018, at 01:29, ROS Didier  wrote:
> 
> Hi Paul
>  
>Thank you very much for your feedback which is very 
> informative.
>I understand that concerning the encryption of credit card 
> numbers, it is imperative to respect the PCI DSS document. I am going to 
> study it.
>However, I would like to say that I chose my example badly by 
> using a table storing credit card numbers. In fact, my problem is more 
> generic.
> I want to implement a solution that encrypts “sensitive” data and can 
> retrieve data with good performance (by using an index).
> I find that the solution you propose is very interesting and I am going to 
> test it.
>  
> Best Regards
> Didier ROS
>  
> De : p...@paulmcgarry.com [mailto:p...@paulmcgarry.com] 
> Envoyé : lundi 8 octobre 2018 00:11
> À : ROS Didier 
> Cc : fola...@peoplecall.com; pavel.steh...@gmail.com; 
> pgsql-...@lists.postgresql.org; pgsql-performa...@lists.postgresql.org; 
> pgsql-general@lists.postgresql.org
> Objet : Re: Why the index is not used ?
>  
> Hi Didier,
>  
> I’m sorry to tell you that you are probably doing something (ie 
> handling/storing credit cards) which would mean you have to comply with PCI 
> DSS requirements.
>  
> As such you should probably have a QSA (auditor) who you can run any proposed 
> solution by (so you know they will be comfortable with it when they do their 
> audit).
>  
> I think your current solution would be frowned upon because:
> - cards are effectively stored in plaintext in the index.
> - your encryption/decryption is being done in database, rather than by 
> something with that as its sole role.
>  
> People have already mentioned the former so I won’t go into it further
>  
> But for the second part if someone can do a 
>  
> Select pgp_sym_decrypt(cc)
>  
> then you are one sql injection away from having your card data stolen. You do 
> have encryption, but in practice everything is available unencrypted so in 
> practice the encryption is more of a tick in a box than an actual defence 
> against bad things happening. In a properly segmented system even your DBA 
> should not be able to access decrypted card data.
>  
> You probably should look into doing something like:
>  
> - store the first 6 and last 4 digits of the card unencrypted.
> - store the remaining card digits encrypted
> - have the encryption/decryption done by a seperate service called by your 
> application code outside the db.
>  
> You haven’t gone into what your requirements re search are (or I missed them) 
> but while the above won’t give you a fast exact cc lookup in practice being 
> able to search using the first 6 and last 4 can get you a small enough subset 
> than can then be filtered after decrypting the middle. 
>  
> We are straying a little off PostgreSQL topic here but if you and/or your 
> management aren’t already looking at PCI DSS compliance I’d strongly 
> recommend you do so. It can seem like a pain but it is much better to take 
> that pain up front rather than having to reengineer everything later. There 
> are important security aspects it helps make sure you cover but maybe some 
> business aspects (ie possible partners who won’t be able to deal with you 
> without your compliance sign off documentation).
>  
>  
> The alternative, if storing cc data isn’t a core requirement, is to not store 
> the credit card data at all. That is generally the best solution if it meets 
> your needs, ie if you just want to accept payments then use a third party who 
> is PCI compliant to handle the cc part.
>  
> I hope that helps a little.
>  
> Paul
>  
>  
>  
> 
> Sent from my iPhone
> 
> On 8 Oct 2018, at 05:32, ROS Didier  wrote:
> 
> Hi Francisco
> 
>Thank you for your remark. 
>You're right, but it's the only procedure I found to make search on 
> encrypted fields with good response times (using index) !
> 
>Regarding access to the file system, our servers are in protected network 
> areas. 

Re: Why the index is not used ?

2018-10-07 Thread Paul McGarry
Hi Didier,

I’m sorry to tell you that you are probably doing something (ie 
handling/storing credit cards) which would mean you have to comply with PCI DSS 
requirements.

As such you should probably have a QSA (auditor) who you can run any proposed 
solution by (so you know they will be comfortable with it when they do their 
audit).

I think your current solution would be frowned upon because:
- cards are effectively stored in plaintext in the index.
- your encryption/decryption is being done in database, rather than by 
something with that as its sole role.

People have already mentioned the former so I won’t go into it further

But for the second part if someone can do a 

>> Select pgp_sym_decrypt(cc)

then you are one sql injection away from having your card data stolen. You do 
have encryption, but in practice everything is available unencrypted so in 
practice the encryption is more of a tick in a box than an actual defence 
against bad things happening. In a properly segmented system even your DBA 
should not be able to access decrypted card data.

You probably should look into doing something like:

- store the first 6 and last 4 digits of the card unencrypted.
- store the remaining card digits encrypted
- have the encryption/decryption done by a seperate service called by your 
application code outside the db.

You haven’t gone into what your requirements re search are (or I missed them) 
but while the above won’t give you a fast exact cc lookup in practice being 
able to search using the first 6 and last 4 can get you a small enough subset 
than can then be filtered after decrypting the middle. 

We are straying a little off PostgreSQL topic here but if you and/or your 
management aren’t already looking at PCI DSS compliance I’d strongly recommend 
you do so. It can seem like a pain but it is much better to take that pain up 
front rather than having to reengineer everything later. There are important 
security aspects it helps make sure you cover but maybe some business aspects 
(ie possible partners who won’t be able to deal with you without your 
compliance sign off documentation).


The alternative, if storing cc data isn’t a core requirement, is to not store 
the credit card data at all. That is generally the best solution if it meets 
your needs, ie if you just want to accept payments then use a third party who 
is PCI compliant to handle the cc part.

I hope that helps a little.

Paul




Sent from my iPhone

> On 8 Oct 2018, at 05:32, ROS Didier  wrote:
> 
> Hi Francisco
> 
>Thank you for your remark. 
>You're right, but it's the only procedure I found to make search on 
> encrypted fields with good response times (using index) !
> 
>Regarding access to the file system, our servers are in protected network 
> areas. few people can connect to it.
> 
>it's not the best solution, but we have data encryption needs and good 
> performance needs too. I do not know how to do it except the specified 
> procedure..
>if anyone has any proposals to put this in place, I'm interested.
> 
>Thanks in advance
> 
> Best Regards
> Didier ROS
> 
> -Message d'origine-
> De : fola...@peoplecall.com [mailto:fola...@peoplecall.com] 
> Envoyé : dimanche 7 octobre 2018 17:58
> À : ROS Didier 
> Cc : pavel.steh...@gmail.com; pgsql-...@lists.postgresql.org; 
> pgsql-performa...@lists.postgresql.org; pgsql-general@lists.postgresql.org
> Objet : Re: Why the index is not used ?
> 
> ROS:
> 
>> On Sun, Oct 7, 2018 at 3:13 PM, ROS Didier  wrote:
>> 
>> -INSERT INTO cartedecredit(username,cc)  SELECT 'individu ' || x.id, 
>> pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, 
>> cipher-algo=aes256') FROM generate_series(1,10) AS x(id);
>> -CREATE INDEX idx_cartedecredit_cc02 ON 
>> cartedecredit(pgp_sym_decrypt(cc, 'motdepasse','compress-algo=2, 
>> cipher-algo=aes256'));
> 
> If my french is not too rusty you are encrypting a credit-card, and then 
> storing an UNENCRYPTED copy in the index. So, getting it from the server is 
> trivial for anyone with filesystem access.
> 
> Francisco Olarte.
> 
> 
> 
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis 
> à l'intention exclusive des destinataires et les informations qui y figurent 
> sont strictement confidentielles. Toute utilisation de ce Message non 
> conforme à sa destination, toute diffusion ou toute publication totale ou 
> partielle, est interdite sauf autorisation expresse.
> 
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
> copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. 
> Si vous avez reçu ce Message par erreur, merci de le supprimer de votre 
> système, ainsi que toutes ses copies, et de n'en garder aucune trace sur 
> quelque support que ce soit. Nous vous remercions également d'en avertir 
> immédiatement l'expéditeur par retour du message.
> 
> Il est impossible de garantir que les 

Re: Why the index is not used ?

2018-10-06 Thread Paul McGarry
I haven’t looked up what pgp_sym_encrypt() does but assuming it does encryption 
the way you should be for credit card data then it will be using a random salt 
and the same input value won’t encrypt to the same output value so

WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');

wouldn’t work because the value generated by the function when you are 
searching on isn’t the same value as when you stored it.



Paul

> On 6 Oct 2018, at 19:57, ROS Didier  wrote:
> 
> WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');


pg_stats avg_width and null_frac

2018-06-05 Thread Paul McGarry
Can anyone confirm that the "avg_width" reported in the pg_stats is the
avg_width not including any null rows?

ie if a field had:

avg_width: 6
null_frac: 0.5

Then
- 50% of the rows would be empty for this field
- The other 50% of the rows would have data with an avg_width of 6 bytes?
(according to the sampled data at least)

Paul