Re: Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-23 Thread Rihad

On 8/21/23 20:50, Adrian Klaver wrote:

On 8/21/23 09:31, Rihad wrote:

On 8/21/23 20:17, Adrian Klaver wrote:

On 8/21/23 09:09, Rihad wrote:

On 8/21/23 20:00, Adrian Klaver wrote:






Sorry, they are all as per default, commented out in the config.

There are no long running queries, otherwise they wouldn't be 
vacuumed/analyzed in due time after running first manual analyze, 
which updates n_live_tup to match reltuples.




My only remaining suggestion is to closely monitor the Postgres log 
and see if provides a clue.


I'm awfully sorry, I read the autovacuum manual carefully, it isn't 
n_live_tup, but reltuples that is taken into account during the calculation.



vacuum threshold = vacuum base threshold + vacuum scale factor * number of 
tuples

where the vacuum base threshold is autovacuum_vacuum_threshold 
<https://www.postgresql.org/docs/current/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-THRESHOLD>, 
the vacuum scale factor is autovacuum_vacuum_scale_factor 
<https://www.postgresql.org/docs/current/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-SCALE-FACTOR>, 
*and the number of tuples is **|pg_class|**.**|reltuples|**.*



Your first suggestion was to RTFM.


Re: Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-21 Thread Rihad

On 8/21/23 20:17, Adrian Klaver wrote:

On 8/21/23 09:09, Rihad wrote:

On 8/21/23 20:00, Adrian Klaver wrote:




Thanks for the detailed reply, no tables have custom settings.

I need to make it clear once again that all autovac/analyze work as 
expected when n_live_tup matches reality, i.e. when analyze has been 
run on them since last reset.


A way to fix this is to simply analyze the whole database. Before 
doing that, while n_live_tup starts from basically 0 and grows based 
on DB activity, these usual calculations of 10-20% table size for 
vacuum/analyze don't work. They don't trigger autovac for most 
tables, or do it much much later.




You still have not said or shown whether the other autovacuum settings 
are the default values or not. Assuming they are, then the only other 
explanation I can come up with is that there is a process or processes 
that are creating long running open transactions that prevent 
autovacuum from running on the affected tables.




Sorry, they are all as per default, commented out in the config.

There are no long running queries, otherwise they wouldn't be 
vacuumed/analyzed in due time after running first manual analyze, which 
updates n_live_tup to match reltuples.






Re: Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-21 Thread Rihad

On 8/21/23 20:00, Adrian Klaver wrote:

On 8/20/23 22:31, Rihad wrote:

On 8/21/23 00:15, Adrian Klaver wrote:

On 8/20/23 12:10, Rihad wrote:

On 8/20/23 20:22, Adrian Klaver wrote:

On 8/18/23 22:35, Rihad wrote:

On 8/17/23 13:01, rihad wrote:






Hard to say without seeing the actual settings in postgresql.conf 
that match:


https://www.postgresql.org/docs/current/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-SCALE-FACTOR 




Most importantly:

autovacuum

and

track_counts

https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-COUNTS 





They are both on and set as per default. Autovac/analyze continue 
running on some tables after pg_stat_reset. Just not on all of them, 
even thought they should judging by live/dead tuples calculation.



foo=> show track_counts;
track_counts
--
on
(1 row)

foo=> show autovacuum;
autovacuum

on
(1 row)


How about the rest of the settings at?:

https://www.postgresql.org/docs/current/runtime-config-autovacuum.html

Have the storage parameters for the tables been changed per?:

https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS 



In psql you can do:

\d+ 

The setting if changed will show up as Options: 

Also are there include directives in use per?:

https://www.postgresql.org/docs/current/config-setting.html#CONFIG-INCLUDES 



You can see by looking at the sourcefile field in pg_settings:

https://www.postgresql.org/docs/current/view-pg-settings.html


Thanks for the detailed reply, no tables have custom settings.

I need to make it clear once again that all autovac/analyze work as 
expected when n_live_tup matches reality, i.e. when analyze has been run 
on them since last reset.


A way to fix this is to simply analyze the whole database. Before doing 
that, while n_live_tup starts from basically 0 and grows based on DB 
activity, these usual calculations of 10-20% table size for 
vacuum/analyze don't work. They don't trigger autovac for most tables, 
or do it much much later.













There are still many tables waiting for their turn, which is long 
due.


Although there are some tables having only 60-70 (not 60-70K) 
n_live_tup that have had autovacuum run on them. Weird.





















Re: Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-21 Thread Rihad

On 8/21/23 00:15, Adrian Klaver wrote:

On 8/20/23 12:10, Rihad wrote:

On 8/20/23 20:22, Adrian Klaver wrote:

On 8/18/23 22:35, Rihad wrote:

On 8/17/23 13:01, rihad wrote:


Hi, all. After calling pg_stat_reset all statistics used by 
autovacuum got zeroed, and started accumulating from scratch. Some 
tables get acted upon properly, some don't.



Self-replying: yup, it seems there's an arbitrary limit of 100K of 
n_live_tup after which autovac/analyze kicks in, or it seems so.


To know rather then guess read:

https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM 




Sure, I read it before asking.

Taking the first table in the list as an example:


 relname  | n_live_tup | n_dead_tup |   left   | 
n_mod_since_analyze |  left
--+++--+-+ 

fooo     |  32781 | 240663 |  -234057 | 
  513265 | -509937



n_dead_tup (not the actual value, but some time after calling 
pg_stat_reset) is much larger than 20% of n_live_tup 32781, and 
n_mod_since_analyze is much larger than 10% of it.


Yet it is kept unvacuumed and unanalyzed for a long time.

autovacuum_(vacuum|analyze)_threshold is 50.

What am I missing?


Hard to say without seeing the actual settings in postgresql.conf that 
match:


https://www.postgresql.org/docs/current/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-SCALE-FACTOR 




Most importantly:

autovacuum

and

track_counts

https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-COUNTS 





They are both on and set as per default. Autovac/analyze continue 
running on some tables after pg_stat_reset. Just not on all of them, 
even thought they should judging by live/dead tuples calculation.



foo=> show track_counts;
track_counts
--
on
(1 row)

foo=> show autovacuum;
autovacuum

on
(1 row)









There are still many tables waiting for their turn, which is long due.

Although there are some tables having only 60-70 (not 60-70K) 
n_live_tup that have had autovacuum run on them. Weird.













Re: Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-21 Thread Rihad

On 8/20/23 20:22, Adrian Klaver wrote:

On 8/18/23 22:35, Rihad wrote:

On 8/17/23 13:01, rihad wrote:


Hi, all. After calling pg_stat_reset all statistics used by 
autovacuum got zeroed, and started accumulating from scratch. Some 
tables get acted upon properly, some don't.



Self-replying: yup, it seems there's an arbitrary limit of 100K of 
n_live_tup after which autovac/analyze kicks in, or it seems so.


To know rather then guess read:

https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM



Sure, I read it before asking.

Taking the first table in the list as an example:


    relname  | n_live_tup | n_dead_tup |   left   | 
n_mod_since_analyze |  left

--+++--+-+
fooo     |  32781 | 240663 |  -234057 | 
 513265 | -509937



n_dead_tup (not the actual value, but some time after calling 
pg_stat_reset) is much larger than 20% of n_live_tup 32781, and 
n_mod_since_analyze is much larger than 10% of it.


Yet it is kept unvacuumed and unanalyzed for a long time.

autovacuum_(vacuum|analyze)_threshold is 50.

What am I missing?






There are still many tables waiting for their turn, which is long due.

Although there are some tables having only 60-70 (not 60-70K) 
n_live_tup that have had autovacuum run on them. Weird.









Re: Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-20 Thread Rihad

On 8/17/23 13:01, rihad wrote:


Hi, all. After calling pg_stat_reset all statistics used by autovacuum 
got zeroed, and started accumulating from scratch. Some tables get 
acted upon properly, some don't.



Self-replying: yup, it seems there's an arbitrary limit of 100K of 
n_live_tup after which autovac/analyze kicks in, or it seems so.


There are still many tables waiting for their turn, which is long due.

Although there are some tables having only 60-70 (not 60-70K) n_live_tup 
that have had autovacuum run on them. Weird.




foo=> select 
relname,n_live_tup,n_dead_tup,cast(round(n_live_tup*cast(current_setting('autovacuum_vacuum_scale_factor') 
as float)+current_setting('autovacuum_vacuum_threshold')::int) as 
int)-n_dead_tup as left, n_mod_since_analyz
e,cast(round(n_live_tup*cast(current_setting('autovacuum_analyze_scale_factor') 
as float)+current_setting('autovacuum_analyze_threshold')::int) as 
int)-n_mod_since_analyze as 
left,n_ins_since_vacuum,last_autovacuum,last_autoanalyze from p
g_stat_all_tables where n_live_tup>0 and 
n_dead_tup>=current_setting('autovacuum_vacuum_threshold')::int order 
by 8 nulls first, 4;
relname  | n_live_tup | n_dead_tup |   left   | 
n_mod_since_analyze |  left   | n_ins_since_vacuum | 
   last_autovacuum    |   last_autoanalyze
--+++--+-+-++---+--- 

fooo     |  32781 | 240663 |  -234057 | 
 513265 | -509937 | 270291 | 
  |
fooo                 | 40 |  24891 |   -24833 | 
  49822 |  -49768 |  24931 | 
  |
f    | 46 |  18991 |   -18932 | 
  19099 |  -19044 | 46 | 
  |
fo   |  1 |  12687 |   -12637 | 
  40795 |  -40745 |  1 | 
  |
fo   |   2393 |  5 |   -10586 | 
 137599 | -137310 |   2393 | 
  |
fo   |   9465 |  11919 |    -9976 | 
 352888 | -351892 |   9466 | 
  |
fooo | 26 |   2558 |    -2503 | 
188 |    -135 |   2584 | 
  |
user_sessions    |    118 |   1231 |    -1157 | 
  19114 |  -19052 |    118 | 
  |
fo   | 32 |    562 | -506 | 
226 |    -173 |    594 | 
  |
fooo     | 53 |    537 | -476 | 
644 |    -589 | 53 | 
  |
fo   |    327 |    524 | -409 | 
804 |    -721 |    520 | 
  |
f    | 46 |    104 |  -45 | 
457 |    -402 |    183 | 
  |
foo  | 34 | 93 |  -36 | 
158 |    -105 | 34 | 
  |
f    | 47 | 95 |  -36 | 
364 |    -309 | 47 | 
  |
fo   | 84 | 91 |  -24 | 
177 |    -119 | 84 | 
  |
f    |  290504401 |    9540832 | 48560098 | 
   26663449 | 2387041 |    8319194 | 2023-08-17 
08:03:19.159161+00 | 2023-08-17 07:43:54.982375+00
fo   |  43449 |   3823 | 4917 | 
   4190 | 205 |    377 | 2023-08-17 
08:31:14.5573+00   | 2023-08-17 08:31:15.19454+00
fo   |   3913 |    715 |  118 | 
200 | 241 |  0 | 2023-08-17 
08:44:04.721272+00 | 2023-08-17 08:47:04.875214+00
f    | 73 | 63 |    2 | 
 31 |  26 | 35 | 2023-08-17 
08:45:05.115824+00 | 2023-08-17 08:47:05.032251+00
f    | 790249 | 126240 |    31860 | 
   4149 |   74926 | 119413 | 2023-08-17 
08:46:10.625275+00 | 2023-08-17 08:49:04.867481+00
pg_shdepend  |   1885 |    286 |  141 | 
116 | 122 |    270 | 2023-08-17 
08:47:03.638639+00 | 2023-08-17 08:48:42.903096+00
pg_index |    993 | 79 |  170 | 
 10 | 139 | 72 | 2023-08-

Why doesn't autovacuum/analyze run in due time after calling pg_stat_reset?

2023-08-17 Thread rihad
Hi, all. After calling pg_stat_reset all statistics used by autovacuum 
got zeroed, and started accumulating from scratch. Some tables get acted 
upon properly, some don't.



foo=> select 
relname,n_live_tup,n_dead_tup,cast(round(n_live_tup*cast(current_setting('autovacuum_vacuum_scale_factor') 
as float)+current_setting('autovacuum_vacuum_threshold')::int) as 
int)-n_dead_tup as left, n_mod_since_analyz
e,cast(round(n_live_tup*cast(current_setting('autovacuum_analyze_scale_factor') 
as float)+current_setting('autovacuum_analyze_threshold')::int) as 
int)-n_mod_since_analyze as 
left,n_ins_since_vacuum,last_autovacuum,last_autoanalyze from p
g_stat_all_tables where n_live_tup>0 and 
n_dead_tup>=current_setting('autovacuum_vacuum_threshold')::int order by 
8 nulls first, 4;
relname  | n_live_tup | n_dead_tup |   left   | 
n_mod_since_analyze |  left   | n_ins_since_vacuum | 
   last_autovacuum    |   last_autoanalyze
--+++--+-+-++---+--- 

fooo     |  32781 | 240663 |  -234057 | 
 513265 | -509937 | 270291 | 
  |
fooo                 | 40 |  24891 |   -24833 | 
  49822 |  -49768 |  24931 | 
  |
f    | 46 |  18991 |   -18932 | 
  19099 |  -19044 | 46 | 
  |
fo   |  1 |  12687 |   -12637 | 
  40795 |  -40745 |  1 | 
  |
fo   |   2393 |  5 |   -10586 | 
 137599 | -137310 |   2393 | 
  |
fo   |   9465 |  11919 |    -9976 | 
 352888 | -351892 |   9466 | 
  |
fooo | 26 |   2558 |    -2503 | 
188 |    -135 |   2584 | 
  |
user_sessions    |    118 |   1231 |    -1157 | 
  19114 |  -19052 |    118 | 
  |
fo   | 32 |    562 | -506 | 
226 |    -173 |    594 | 
  |
fooo     | 53 |    537 | -476 | 
644 |    -589 | 53 | 
  |
fo   |    327 |    524 | -409 | 
804 |    -721 |    520 | 
  |
f    | 46 |    104 |  -45 | 
457 |    -402 |    183 | 
  |
foo  | 34 | 93 |  -36 | 
158 |    -105 | 34 | 
  |
f    | 47 | 95 |  -36 | 
364 |    -309 | 47 | 
  |
fo   | 84 | 91 |  -24 | 
177 |    -119 | 84 | 
  |
f    |  290504401 |    9540832 | 48560098 | 
   26663449 | 2387041 |    8319194 | 2023-08-17 
08:03:19.159161+00 | 2023-08-17 07:43:54.982375+00
fo   |  43449 |   3823 | 4917 | 
   4190 | 205 |    377 | 2023-08-17 
08:31:14.5573+00   | 2023-08-17 08:31:15.19454+00
fo   |   3913 |    715 |  118 | 
200 | 241 |  0 | 2023-08-17 
08:44:04.721272+00 | 2023-08-17 08:47:04.875214+00
f    | 73 | 63 |    2 | 
 31 |  26 | 35 | 2023-08-17 
08:45:05.115824+00 | 2023-08-17 08:47:05.032251+00
f    | 790249 | 126240 |    31860 | 
   4149 |   74926 | 119413 | 2023-08-17 
08:46:10.625275+00 | 2023-08-17 08:49:04.867481+00
pg_shdepend  |   1885 |    286 |  141 | 
116 | 122 |    270 | 2023-08-17 
08:47:03.638639+00 | 2023-08-17 08:48:42.903096+00
pg_index |    993 | 79 |  170 | 
 10 | 139 | 72 | 2023-08-17 
08:48:03.67267+00  | 2023-08-17 08:49:03.723851+00
pg_depend    |   9779 |   1027 |  979 | 
130 | 898 |    923 | 2023-08-17 
08:48:03.743417+00 | 2023-08-17 08:49:03.740913+00
foo  |  43699 |   2352 | 6438 | 
   3527 | 893 |   1175 | 2023-08-17 

Re: pg_upgrade v15 not generating analyze_new_cluster.sh

2023-06-13 Thread rihad

On 6/13/23 12:57, Magnus Hagander wrote:

On Tue, Jun 13, 2023 at 10:35 AM rihad  wrote:

Hi, all. When pg_upgrading from PG 11 to 13, a file analyze_new_cluster.sh was 
generated in the current directory which ran vacuumdb --all --analyze-in-stages

When upgrading from 13 to 15.3 no such file was generated, which made me believe it 
was no longer necessary. Alas, it wasn't the case. The database was extremely slow 
after the upgrade chewing up 100% cpu time, and it wasn't until I manually ran 
vacuumdb" --all --analyze-in-stages that cpu usage dropped. Was there a knob 
that I missed, or is this just a bug in pg_upgrade 15? Thanks.

If you look at the output of pg_upgrade, it still tells you to run
"vacuumdb --all --analyze-in-stages". Since that was the *only*
command that was in the script, the script is no longer generated and
you're expected to run the command directly instead. It does not
remove the need for the job, just instructs you to do it without the
script.

This change was made in PostgreSQL 14, not 15, and is listed in the
release notes there. When you upgrade "past" a version, it's
recommended you read the release notes for the intermediate versions
as well when looking for changes, as those will not be included in the
notes for the newer version.



Oh, sht... thanks. pg_upgrade was one of several commands my script 
ran, that also removed PG 13 & installed PG 15 etc right after that, 
that's why I missed pg_upgrade's output and to be honest wasn't 
expecting it to say anything important, because, for instance, it did 
generate update_extensions.sql and delete_old_cluster.sh, which is, btw, 
also a one-liner)) I'd rather that file wasn't generated but just 
mentioned in the output as it has nothing to do with how new cluster 
works, than mentioning vacuumdb as it's crucial to make PG useful in 
most non-test scenarios at all.


pg_upgrade v15 not generating analyze_new_cluster.sh

2023-06-13 Thread rihad
Hi, all. When pg_upgrading from PG 11 to 13, a file 
analyze_new_cluster.sh was generated in the current directory which ran 
vacuumdb --all --analyze-in-stages


When upgrading from 13 to 15.3 no such file was generated, which made me 
believe it was no longer necessary. Alas, it wasn't the case. The 
database was extremely slow after the upgrade chewing up 100% cpu time, 
and it wasn't until I manually ran vacuumdb" --all --analyze-in-stages 
that cpu usage dropped. Was there a knob that I missed, or is this just 
a bug in pg_upgrade 15? Thanks.


pg_upgrade v15 not generating analyze_new_cluster.sh

2023-06-13 Thread rihad
Hi, all. When pg_upgrading from PG 11 to 13, a file 
analyze_new_cluster.sh was generated in the current directory which ran 
vacuumdb --all --analyze-in-stages


When upgrading from 13 to 15.3 no such file was generated, which made me 
believe it was no longer necessary. Alas, it wasn't the case. The 
database was extremely slow after the upgrade chewing up 100% cpu time, 
and it wasn't until I manually ran vacuumdb" --all --analyze-in-stages 
that cpu usage dropped. Was there a knob that I missed, or is this just 
a bug in pg_upgrade 15? Thanks.


Re: Currently running queries with actual arguments?

2021-09-23 Thread rihad

On 9/23/21 7:58 PM, Garfield Lewis wrote:

The way I normally get this info is by setting the following:

log_statement = 'all'

then the arguments will be printed in the postgres.log file. There could be 
some other way but that is what I know.



Thanks, those queries are logged after their completion, but I 
specifically need the list of unfinished long queries currently being 
executed.






Currently running queries with actual arguments?

2021-09-23 Thread rihad
Hi, is it possible to view the list of currently running queries with 
$1, $2 etc replaced with the actual arguments?






Re: Upgrade procedure

2019-11-05 Thread rihad

On 11/05/2019 10:05 PM, Kevin Brannen wrote:

From: rihad 
Hi, all. Why is it normally suggested to stop the server, upgrade it,
then start it? Wouldn't it be easier & quicker to simply upgrade the
package in-place and restart the service? On OSen that allow
modification of currently running binaries, which is most Unix OS, M$
Windows being a notable exception )


That might be possible on a minor upgrade, but quite probably not on a
major version upgrade. I'm reasonably sure I've read that a major
upgrade *can* change underlying data/structures for tables and other
things. I don't think you want version-X writing to the tables on disk
while version-Y writes a new layout to the same files at the same
time. ??



Why would that matter if the server gets restarted after replacing the binaries? Aren't 
previous version's binaries "hard-wired" into memory while they are running? 
AFAIK on FreeBSD at least no attempt is made to stop the corresponding server or restart 
it when a package is upgraded by pkg(8).

We may be talking past each other here a bit...

After you do an upgrade, of course you have to restart the *PG* server or
you won't be using the new code, will you? :)

The manual or others here are more knowledgeable than I, but I believe that
for a "minor" upgrade, you can just swap out the code and restart PG. For
major upgrades, the PG server is going to have to come down as the underlying
files might be changed/transformed during the upgrade, then you start the
PG server when that's done. Check out the -k option as it can significantly
speed up pg_upgrade. You might find it safer to do a "pg_upgrade -c" before
the real upgrade; something to look at. As always on things like this, test
on a non-production machine first.

For us, we always use pg_upgrade even for minor updates because it feels
safer to me. That being said, we rarely do minor updates and just do majors
because upgrading is just hard enough (lots of testing!) we tend to wait and
then jump further. Upgrading is known to take a maintenance window; we just
plan things that way. Your organization may have different needs.

Yeah, but that way you're almost guaranteed to run an unsupported & 
vulnerable release for quite some time, until the next major one is ready )





Re: Upgrade procedure

2019-10-30 Thread rihad

>From: rihad 

Hi, all. Why is it normally suggested to stop the server, upgrade it, 
then start it? Wouldn't it be easier & quicker to simply upgrade the 
package in-place and restart the service? On OSen that allow 
modification of currently running binaries, which is most Unix OS, M$ 
Windows being a notable exception )


That might be possible on a minor upgrade, but quite probably not on a 
major version upgrade. I'm reasonably sure I've read that a major 
upgrade *can* change underlying data/structures for tables and other 
things. I don't think you want version-X writing to the tables on disk 
while version-Y writes a new layout to the same files at the same 
time. 





Why would that matter if the server gets restarted after replacing the 
binaries? Aren't previous version's binaries "hard-wired" into memory 
while they are running? AFAIK on FreeBSD at least no attempt is made to 
stop the corresponding server or restart it when a package is upgraded 
by pkg(8).





Upgrade procedure

2019-10-30 Thread rihad
Hi, all. Why is it normally suggested to stop the server, upgrade it, 
then start it? Wouldn't it be easier & quicker to simply upgrade the 
package in-place and restart the service? On OSen that allow 
modification of currently running binaries, which is most Unix OS, M$ 
Windows being a notable exception )



Thanks.





Re: Quere keep using temporary files

2019-10-25 Thread rihad

On 10/25/2019 05:49 PM, Tom Lane wrote:

rihad  writes:

Hi, we frequently run many query involving XML that use a smallish
temporary file, despite having increased local work_mem in that
transaction to 16GB. FreeBSD's top shows that the memory isn't actually
being used - it remains free. Basically many such queries are run within
a single transaction:
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp92452.1079", size 166518
STATEMENT:  DELETE FROM "foo" WHERE ((col1, col2, col3) in (select col1,
col2, col3 from foo_xml_v2(''))) AND
"foo"."col_id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)
It looks like increasing work_mem doesn't help. Surely 16GB is enough to
cover all these small temp files?

You'd need to provide a lot more detail about what that query is doing
for anyone to be able to guess where the temp file usage is coming from.

regards, tom lane
.

I just checked and saw that the function "foo_xml_v2" above returns 
table. Is this enough to trigger temporary file usage regardless of 
work_mem?






Quere keep using temporary files

2019-10-25 Thread rihad
Hi, we frequently run many query involving XML that use a smallish 
temporary file, despite having increased local work_mem in that 
transaction to 16GB. FreeBSD's top shows that the memory isn't actually 
being used - it remains free. Basically many such queries are run within 
a single transaction:


LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp92452.1079", size 166518
STATEMENT:  DELETE FROM "foo" WHERE ((col1, col2, col3) in (select col1, 
col2, col3 from foo_xml_v2(''))) AND 
"foo"."col_id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)



It looks like increasing work_mem doesn't help. Surely 16GB is enough to 
cover all these small temp files?






Re: Changing work_mem

2019-08-14 Thread rihad

On 08/14/2019 11:42 AM, Laurenz Albe wrote:

rihad wrote:

Sorry, I just decreased work_mem back to 256MB, reloaded, and
instantly started seeing 82mb temp file creation, not 165mb as was
usual with work_mem=512MB.

So it indeed was applied immediately.
Really weird figures )

Increased work_mem to 768MB and start seeing temp file creation log
entries 331MB in size.

Bizzare ) It looks like the bigger it gets, the bigger temp files
are
created.

Why not decrease it to 64mb then...

Temporary files are created whenever the data is estimated to not
fit into "work_mem".  So it is unsurprising that you see bigger
temporary files being created if you increase "work_mem".

Big temporary files will also be created when "work_mem" is small,
but maybe they got lost in the noise of the smaller files.
You should have noticed that fewer files are created when you increase
"work_mem".

Another thing to notice is that the temporary files use another, more
compact format than the data in memory, so you need to increase
"work_mem" to more than X if you want to avoid temporary files
of size X.

Yours,
Laurenz Albe


Thanks. In the end I increased work_mem to 2GB but temporary files are 
still being created, albeit at a much smaller total size (around 
0.2-0.25TB/day compared to 1TB/day of total disk write activity as 
witnessed by SMART's "Host_Writes_32MiB" attribute. The size of each 
file is also limited fro a few tens of bytes to no more than 90KB, so 
given their very short lifetime hopefully some of them stay inside OS 
buffers and do not even land on the SSD.


It's good that the memory is allocated by Postgres on an as-needed basis 
and freed when it is no longer needed. Thankfully those heavy queries 
employing xml are run periodically from cron and aren't part of the 
normal website activity.






Re: Changing work_mem

2019-08-13 Thread rihad

On 08/13/2019 09:04 PM, rihad wrote:

On 08/13/2019 08:44 PM, rihad wrote:

On 08/13/2019 08:22 PM, Luca Ferrari wrote:

On Tue, Aug 13, 2019 at 5:59 PM rihad  wrote:

[dbname] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp93683.257381", size 594


The setting 'work_mem' is within context 'user', that means it will
affect running sessione unless the session itself has already issued a
SET work_mem to xxx.
So this could be a reason why you don't seem to see any change.

Also keep in mind that work_mem work on a connection basis, so you are
going to possibly see 521MB x num_connections if all your clients are
doig the same kind of sort concurrently, which probably causes
PostgreSQL to go to disk due to memory unavailable.

Hope this helps.
Luca
.


Thanks. The box has 15GB mem free (as in FreeBSD )))

And it hasn't moved a notch after the increase.

No code does SET work_mem=... AFAIK.

My apologies to Mr. Peter but I still think that older processes, 
some of them started a couple of weeks ago, use the older setting.
Sorry, I just decreased work_mem back to 256MB, reloaded, and 
instantly started seeing 82mb temp file creation, not 165mb as was 
usual with work_mem=512MB.


So it indeed was applied immediately.
Really weird figures )


Increased work_mem to 768MB and start seeing temp file creation log 
entries 331MB in size.


Bizzare ) It looks like the bigger it gets, the bigger temp files are 
created.


Why not decrease it to 64mb then...





Re: Changing work_mem

2019-08-13 Thread rihad

On 08/13/2019 08:44 PM, rihad wrote:

On 08/13/2019 08:22 PM, Luca Ferrari wrote:

On Tue, Aug 13, 2019 at 5:59 PM rihad  wrote:

[dbname] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp93683.257381", size 594


The setting 'work_mem' is within context 'user', that means it will
affect running sessione unless the session itself has already issued a
SET work_mem to xxx.
So this could be a reason why you don't seem to see any change.

Also keep in mind that work_mem work on a connection basis, so you are
going to possibly see 521MB x num_connections if all your clients are
doig the same kind of sort concurrently, which probably causes
PostgreSQL to go to disk due to memory unavailable.

Hope this helps.
Luca
.


Thanks. The box has 15GB mem free (as in FreeBSD )))

And it hasn't moved a notch after the increase.

No code does SET work_mem=... AFAIK.

My apologies to Mr. Peter but I still think that older processes, some 
of them started a couple of weeks ago, use the older setting.
Sorry, I just decreased work_mem back to 256MB, reloaded, and instantly 
started seeing 82mb temp file creation, not 165mb as was usual with 
work_mem=512MB.


So it indeed was applied immediately.
Really weird figures )




Re: Changing work_mem

2019-08-13 Thread rihad

On 08/13/2019 08:22 PM, Luca Ferrari wrote:

On Tue, Aug 13, 2019 at 5:59 PM rihad  wrote:

[dbname] LOG:  temporary file: path
"base/pgsql_tmp/pgsql_tmp93683.257381", size 594


The setting 'work_mem' is within context 'user', that means it will
affect running sessione unless the session itself has already issued a
SET work_mem to xxx.
So this could be a reason why you don't seem to see any change.

Also keep in mind that work_mem work on a connection basis, so you are
going to possibly see 521MB x num_connections if all your clients are
doig the same kind of sort concurrently, which probably causes
PostgreSQL to go to disk due to memory unavailable.

Hope this helps.
Luca
.


Thanks. The box has 15GB mem free (as in FreeBSD )))

And it hasn't moved a notch after the increase.

No code does SET work_mem=... AFAIK.

My apologies to Mr. Peter but I still think that older processes, some 
of them started a couple of weeks ago, use the older setting.


ps -auxww output:

postgres   2705   43.6 27.5 34668984 27486640  -  Rs 14:00 
3:15.31 postgres: dbname dbname 192.168.0.4(60614)  (postgres)
postgres   7135   25.6 29.5 34437560 29499336  -  Ss Mon07    
19:12.55 postgres: dbname dbname 192.168.0.4(23540)  (postgres)
postgres  99760   14.8 25.9 34425200 25901744  -  Ss 13:10    
57:31.86 postgres: dbname dbname 192.168.0.4(29650)  (postgres)
postgres  28308    9.2 32.0 34445752 32050372  -  Ss 5Aug19  
83:59.83 postgres: dbname dbname 192.168.0.3(59717) (postgres)
postgres  21835    6.8 32.7 34451896 32750048  -  Ss Tue18   
266:10.50 postgres: dbname dbname 192.168.0.3(60080)  (postgres)
postgres  31957    5.2 31.7 34443704 31703072  -  Ss Mon14    
29:21.74 postgres: dbname dbname 192.168.0.3(40905)  (postgres)
postgres   2640    4.0 28.7 34435512 28667216  -  Ss 13:59 
4:10.96 postgres: dbname dbname 192.168.0.4(60537)  (postgres)
postgres  16727    4.0 32.9 34439608 32948936  -  Ss 2Aug19 
316:14.67 postgres: dbname dbname 192.168.0.3(20897) (postgres)
postgres  99672    3.6 28.3 34439608 28347760  -  Ss 13:08 
7:05.25 postgres: dbname dbname 192.168.0.3(35980)  (postgres)
postgres  48532    3.2 33.1 34451896 33078900  -  Ss 23Jul19 
374:10.75 postgres: dbname dbname 192.168.0.3(59891)  (postgres)
postgres   7141    2.8 31.6 34441656 31622616  -  Ss Mon07    
38:19.36 postgres: dbname dbname 192.168.0.4(23618)  (postgres)
postgres  14065    2.8 30.6 34431404 30568776  -  Ss Mon10    
95:06.20 postgres: dbname dbname 192.168.0.4(65211)  (postgres)






Re: Changing work_mem

2019-08-13 Thread rihad

On 08/13/2019 07:41 PM, Peter Eisentraut wrote:

On 2019-08-13 17:16, rihad wrote:

If I increase it in postgresql.conf and SIGHUP the master server, will
the change be applied to all running backends, or only to the ones
started after the change? Thanks.

It will be applied to all running backends.



Thanks, but this isn't what I'm seeing in the logs.

After I increased work_mem from 256MB to 512MB I still see lines even 
with very small files created as before, including much larger ones, 
hundreds of MB.



[dbname] LOG:  temporary file: path 
"base/pgsql_tmp/pgsql_tmp93683.257381", size 594



It seems unlikely that the temp file is still 594 bytes bigger than 
512MB after the change.


Maybe some other stuff unconditionally creates temp files regardless of 
what's in work_mem?


All these "tempies" are the reason our SSD disks hosting a single 
database are seeing 1TB writes in a day, according to SMART.






Changing work_mem

2019-08-13 Thread rihad
If I increase it in postgresql.conf and SIGHUP the master server, will 
the change be applied to all running backends, or only to the ones 
started after the change? Thanks.






Re: "Time of latest checkpoint" stays too old on both master and slave

2019-06-30 Thread rihad

On 06/30/2019 09:45 PM, Andrew Gierth wrote:

"rihad" == rihad   writes:

  rihad> There's a weird problem, even when the time is initially set by
  rihad> openntpd it keeps lagging by one second every few seconds:

  rihad> $ sudo /usr/local/etc/rc.d/openntpd restart

What OS is this?

I've seen this kind of thing with FreeBSD where the kernel timecounter
source has been chosen badly (i.e. choosing TSC when the TSC isn't
actually invariant enough). Forcing TSC not to be used fixes it. The
configuration I've especially noticed it on is when running in a VM with
a single virtual CPU.



Exactly. You're right. It's on FreeBSD 11.2. After some googling earlier 
I changed kern.timecounter.hardware=HPET and solved the problem. The 
default chosen value TSC-low seems to misbehave for this box, although 
it works on others (running the same FreeBSD version).






Re: "Time of latest checkpoint" stays too old on both master and slave

2019-06-30 Thread rihad

On 06/30/2019 10:59 AM, rihad wrote:
Damn. Sorry, and please disregard my post. The master server had the 
wrong time. Not wrong TZ, simply wrong time.


$ date
Sun Jun 30 08:34:52 +04 2019

while it's currently 10:58


There's a weird problem, even when the time is initially set by openntpd 
it keeps lagging by one second every few seconds:



$ sudo /usr/local/etc/rc.d/openntpd restart
Performing sanity check on openntpd configuration:
configuration OK
Stopping openntpd.
Waiting for PIDS: 85893.
Performing sanity check on openntpd configuration:
configuration OK
Starting openntpd.
$ ssh good-server date; date
Sun Jun 30 11:04:17 +04 2019
Sun Jun 30 11:04:17 +04 2019
$ ssh good-server date; date
Sun Jun 30 11:04:25 +04 2019
Sun Jun 30 11:04:24 +04 2019
$ ssh good-server date; date
Sun Jun 30 11:04:32 +04 2019
Sun Jun 30 11:04:31 +04 2019
$ ssh good-server date; date
Sun Jun 30 11:04:39 +04 2019
Sun Jun 30 11:04:37 +04 2019
$ ssh good-server date; date
Sun Jun 30 11:04:48 +04 2019
Sun Jun 30 11:04:45 +04 2019


Really weird. But this isn't a PG problem at all, just a server glitch 
maybe. sorry again.






Re: "Time of latest checkpoint" stays too old on both master and slave

2019-06-30 Thread rihad
Damn. Sorry, and please disregard my post. The master server had the 
wrong time. Not wrong TZ, simply wrong time.


$ date
Sun Jun 30 08:34:52 +04 2019

while it's currently 10:58




"Time of latest checkpoint" stays too old on both master and slave

2019-06-30 Thread rihad
Current time is 10:44. pg_controldata shows on both on master & slave 
server which uses streaming replication:


Time of latest checkpoint:    Sun Jun 30 07:49:18 2019

So it was almost 3 hours ago. There are always some heavy writes and a 
new WAL file in the pg_wal/ directory is created every few minutes. 
checkpoint_timeout is 20min so it should have triggered long ago.


checkpoint_timeout = 20min #5min    # range 30s-1d
max_wal_size = 8GB
min_wal_size = 80MB
checkpoint_completion_target = 0.9


hot_standby is enabled on the slave, hot_standby_feedback is off not to 
bloat the master,


hot_standby_streaming_delay is 30min.

Experiencing this long delay after the upgrade (via dump/restore) from 
PG 9.6 to 11.4.



Thanks for any tips.





Re: Upgrading locale issues

2019-05-03 Thread rihad

On 05/03/2019 05:35 PM, Daniel Verite wrote:

For non-English text, I would recommend C.UTF-8 over "C" because of


BTW, there's no C.UTF-8 inside pg_collation, and running select 
pg_import_system_collations('pg_catalog') doesn't bring it in, at least 
not on FreeBSD 11.2.





Re: Upgrading locale issues

2019-05-03 Thread rihad

On 05/03/2019 05:35 PM, Daniel Verite wrote:

rihad wrote:


Thanks, I'm a bit confused here. AFAIK indexes are used for at least two
things: for speed and for skipping the ORDER BY step (since btree
indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index
still work correctly for table lookups?

If the lookup is based on a equality test or a pattern match with LIKE
or a regexp, it makes no difference.  But the locale makes a
difference with inequality tests, such as < > or BETWEEN.

Around version 9.1 and in the pre-ICU days, Robert Haas wrote
this post that explained it pretty well, I think:
http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html

Quote:

   If you happen to need the particular sorting behavior that
   collation-aware sorting and comparisons provide, then you may find
   this price worth paying, but I suspect there are a lot of people out
   there who are paying it more or less accidentally and don't really
   care very much about the underlying sorting behavior.  If, for
   example, all of your queries are based on equality, and you don't
   use greater-than or less-than tests, then it doesn't matter what
   collation is in use. You might as well use "C" instead of whatever
   your local default may be, because it's faster.

For non-English text, I would recommend C.UTF-8 over "C" because of
its better handling of Unicode characters. For instance:

=# select upper('été' collate "C"), upper('été' collate "C.UTF-8");
  upper | upper
---+---
  éTé   | ÉTÉ

The "price" of linguistic comparisons that Robert mentioned was about
performance, but the troubles we have with the lack of long-term
immutability of these locales are worth being added to that.


And can the existing en_US.UTF-8 fields' definition be altered in
place, without a dump+restore?

Changing the COLLATE clause of a column with ALTER TABLE does
implicitly rebuild an index on this column if there is one,
A dump+restore is not needed, nor an explicit REINDEX.

The dump+restore is needed in another scenario, where you would
decide to change the LC_COLLATE and LC_CTYPE of the database,
instead of doing it only for some columns.
This scenario makes perfect sense if the locale of the database
has been set implicitly and it uses linguistic sorts without
really needing them ("accidentally" as said in the post).



en_US.UTF-8 is the default encoding+locale+collation, it
isn't set explicitly for any of our string columns. I assume there's
some "catch-all" ordering taking place even for the C locale, so there
won't be any bizarre things like b coming before a, or generally for any
language, the second letter of its alphabet coming before the first?

'b' < 'a' is never true for any locale AFAIK, it is 'B' < 'a' that is
true for some locales such as C or C.UTF-8.


Best regards,



Thanks a lot for sharing your insights!





Re: Upgrading locale issues

2019-05-02 Thread rihad

On 05/02/2019 05:36 PM, Daniel Verite wrote:

rihad wrote:


Thanks for the reply. Do you know what would a "decent" ICU collation be
to bind to a field's schema definition so it would mimic a UTF-8
encoding for a multilingual column? Maybe und-x-icu? We aren't as much
concerned about their sortability in most cases, we just want indexes to
better handle future PG/ICU upgrades. But what does und(efined) even
mean with respect to collations?

"undefined" in this context means unspecified language and
unspecified country or region. It implies that no language-specific
nor regional rule will be applied to compare strings.

Using C.UTF-8 as the collation for text fields to index may be the
best trade-off in your case. It should be immune to libc and ICU
upgrades.

With C.UTF-8, a string like 'BC' will sort before 'ab', and punctuation
and accents will also sort differently than with a linguistic-aware
collation.
Thanks, I'm a bit confused here. AFAIK indexes are used for at least two 
things: for speed and for skipping the ORDER BY step (since btree 
indexes are already sorted). Will such an "upgrade-immune" C.UTF-8 index 
still work correctly for table lookups? And can the existing  
en_US.UTF-8 fields' definition be altered in place, without a 
dump+restore? en_US.UTF-8 is the default encoding+locale+collation, it 
isn't set explicitly for any of our string columns. I assume there's 
some "catch-all" ordering taking place even for the C locale, so there 
won't be any bizarre things like b coming before a, or generally for any 
language, the second letter of its alphabet coming before the first?




If your applications care about that, it can be fixed by simply
adding COLLATE "default" to the ORDER BY clause of the queries that
are meant to present data to users.
COLLATE "default" means the collation of the database, which
presumably would be something like "language_REGION.UTF-8" in your
case. If you never specified it explicitly, it came from initdb which
itself got it from the environment of the server.


Best regards,







Re: Upgrading locale issues

2019-05-02 Thread rihad

On 05/02/2019 12:26 AM, Peter Geoghegan wrote:

On Mon, Apr 29, 2019 at 7:45 AM rihad  wrote:

Hi. Today we run pg_ctl promote on a slave server (10.7) and started
using it as a master. The OS also got upgraded FreeBSD 10.4 -> FreeBSD
11.2. And you guessed it, most varchar indexes got corrupted because
system local changed in subtle ways. So I created the extension amcheck
and reindexed all bad indexes one by one. Is there any way to prevent
such things in the future? Will switching to ICU fix all such issues?

Not necessarily, but it will detect the incompatibility more or less
automatically, making it far more likely that the problem will be
caught before it does any harm. ICU versions collations, giving
Postgres a way to reason about their compatibility over time. The libc
collations are not versioned, though (at least not in any standard way
that Postgres can take advantage of).


The problem with it is that ICU collations are absent in pg_collation,
initdb should be run to create them, but pg_basebackup only runs on an
empty base directory, so I couldn't run initdb + pg_basebackup to
prepare the replica server. I believe I can run the create collation
command manually, but what would it look like for en-x-icu?

It is safe to call pg_import_system_collations() directly, which is
all that initdb does. This is documented, so you wouldn't be relying
on a hack.

Thanks for the reply. Do you know what would a "decent" ICU collation be 
to bind to a field's schema definition so it would mimic a UTF-8 
encoding for a multilingual column? Maybe und-x-icu? We aren't as much 
concerned about their sortability in most cases, we just want indexes to 
better handle future PG/ICU upgrades. But what does und(efined) even 
mean with respect to collations? With UTF-8 at least some default 
collation is specified, like en_US.UTF-8. Will results be in a 
completely undefined order as a result of ORDER BY "icu_und_column"?






Upgrading locale issues

2019-04-29 Thread rihad
Hi. Today we run pg_ctl promote on a slave server (10.7) and started 
using it as a master. The OS also got upgraded FreeBSD 10.4 -> FreeBSD 
11.2. And you guessed it, most varchar indexes got corrupted because 
system local changed in subtle ways. So I created the extension amcheck 
and reindexed all bad indexes one by one. Is there any way to prevent 
such things in the future? Will switching to ICU fix all such issues? 
The problem with it is that ICU collations are absent in pg_collation, 
initdb should be run to create them, but pg_basebackup only runs on an 
empty base directory, so I couldn't run initdb + pg_basebackup to 
prepare the replica server. I believe I can run the create collation 
command manually, but what would it look like for en-x-icu?



CREATE COLLATION "en-ix-icu" (provider = icu, locale = 'en-x-icu');


is that it? But what about version etc?


    collname    | collnamespace | collowner | collprovider | 
collencoding |    collcollate    | collctype | collversion


 en-x-icu   |    11 |    10 | i    
|   -1 | en    | en    | 153.88



Thanks.




Does "ON UPDATE" for foreign keys require index?

2019-04-25 Thread rihad
Hi. Say I have column A.b_id which references B.id (which is a primary 
key) and as such it is declared as a foreign key constraint. A.b_id has 
no index because it doesn't need one. What happens when table B's rows 
are modified (but never deleted)? Will PG still have have to scan A fo 
find A.b_id to do nothing with it? )) B.id itself is never modified, 
it's just a normal serial value typically used for id's.



The docs are a bit ambiguous:

Since a DELETE of a row from the referenced table *or an **UPDATE**of 
a referenced column* will require a scan of the referencing table for 
rows matching the old value, it is often a good idea to index the 
referencing columns too. Because this is not always needed, and there 
are many choices available on how to index, declaration of a foreign 
key constraint does not automatically create an index on the 
referencing columns.





https://www.postgresql.org/docs/9.6/ddl-constraints.html



Multicolumn index for single-column queries?

2019-04-18 Thread rihad

Hi. Say there are 2 indexes:

"foo_index" btree (foo_id)

"multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their WHERE 
clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely 
only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them 
both, but wouldn't searches
on foo_id alone become slower?

Thanks.





Re: When do vacuumed pages/tuples become available for reuse?

2019-04-12 Thread rihad

On 04/12/2019 08:39 AM, Michael Lewis wrote:


Way to many indexes. I'm going to have a hard time convincing our
programmers to get rid of any of them )


You can create (concurrently) an identical index with a new name, then 
drop old version concurrently and repeat for each. It doesn't help you 
figure out the root cause and how to prevent it from happening again, 
but gets you to a fresh start at least.


Thanks for the tip. I believe other than increasing load temporarily 
creating a new index, dropping the old one and renaming the new one to 
the old one are transparent and (almost) lock-less operations? What 
would happen to user queries during DROP INDEX? Would they block on it, 
or, being unable to read it without blocking, will they pick the new one 
based on the same column(s)? And, likewise, is ALTER INDEX ... RENAME 
... an instant operation, I hope?




Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 10:13 PM, Jeff Janes wrote:
On Thu, Apr 11, 2019 at 12:18 PM rihad <mailto:ri...@mail.ru>> wrote:


On 04/11/2019 08:09 PM, Jeff Janes wrote:

On Thu, Apr 11, 2019 at 11:44 AM rihad mailto:ri...@mail.ru>> wrote:


Since we dump production DB daily into staging
environment, the difference in size (as reported by psql's
\l+) is 11GB in a freshly restored DB as opposed to 70GB in
production.


Yeah, that seems like a problem.  Do you have long lived
transactions/snapshots that are preventing vacuuming from
removing dead tuples? You can run a manual "vacuum verbose" and
see how many dead but nonremovable tuples there were, or
set log_autovacuum_min_duration to some non-negative value less
than the autovac takes, and do the same.


vacuum frees tuples just fine. It's just that by the time each run
finishes many more accumulate due to table update activity, ad
nauseum. So this unused space constantly grows. Here's a sample
autovacuum run:

2019-04-11 19:39:44.450841500 [] LOG:  automatic vacuum of table
"foo.public.bar": index scans: 1
2019-04-11 19:39:44.450843500   pages: 0 removed, 472095 remain, 4
skipped due to pins, 39075 skipped frozen
2019-04-11 19:39:44.450844500   tuples: 19150 removed, 2725811
remain, 465 are dead but not yet removable
2019-04-11 19:39:44.450845500   buffer usage: 62407557 hits,
6984769 misses, 116409 dirtied
2019-04-11 19:39:44.450846500   avg read rate: 16.263 MB/s, avg
write rate: 0.271 MB/s
2019-04-11 19:39:44.450847500   system usage: CPU 59.05s/115.26u
sec elapsed 3355.28 sec


This data doesn't seem to support either one of our theories.  "Dead 
but not yet removable" is low.  But "removed" also seems pretty low.  
Is 19,150 really the number of updates you think occur over the course 
of an hour which causes the problem you are seeing?  Updates that 
happened during one vacuum should be cleanly caught by the next one, 
so you should only see a steady state of bloat, not unbounded increase.


But your buffer usage being 132 time the number of pages in the table 
suggests it is your indexes, not your table, which are bloated.
How many indexes do you have, and of what type?  Index pages can only 
get reused when they become completely empty, or when a new indexed 
value fits into (or near) the key-space that that page already 
covers.  So if the key space for new tuples is constantly migrating 
around and your pages never become absolutely empty, you can get 
unbounded bloat in the indexes.


Way to many indexes. I'm going to have a hard time convincing our 
programmers to get rid of any of them )




Can you compare the sizes object by object between the live and the 
stage, taking care not to include index (or toast) size into the size 
of their parent table?


You're right, it's mostly indexes that are bloated.

Staging:
# select pg_size_pretty(pg_relation_size('foo'));
 pg_size_pretty

 2924 MB
(1 row)

# select pg_size_pretty(pg_indexes_size('foo'));
 pg_size_pretty

 1958 MB
(1 row)

Prod:

# select pg_size_pretty(pg_relation_size('foo'));
 pg_size_pretty

 3688 MB
(1 row)

# select pg_size_pretty(pg_indexes_size('foo'));
 pg_size_pretty

 60 GB
(1 row)







Also, what does pg_freespace
(https://www.postgresql.org/docs/current/pgfreespacemap.html)
show about the available of space in the table? How
about pgstattuple
(https://www.postgresql.org/docs/current/pgstattuple.html)

Thanks, I'll try those. But as I said freshly restored DB is only
11GB in size, not 70 (only public schema is used).


Yeah, but we need to know **why** that extra 59GB  is not being 
reused, not simply the fact that it isn't being reused.  If it isn't 
listed as free in the freespace map, then PostgreSQL might not know 
how to find it in order to reuse it, for example. But now that I think 
it is the indexes, not the table, that is bloated I would chase that 
part down first.  No point checking the freespace of the table proper 
if the problem is with the indexes.

Cheers,

Jeff





Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 08:09 PM, Jeff Janes wrote:
On Thu, Apr 11, 2019 at 11:44 AM rihad <mailto:ri...@mail.ru>> wrote:


On 04/11/2019 07:40 PM, Jeff Janes wrote:


The disk usage doesn't reach a steady state after one or two
autovacs?  Or it does, but you are just unhappy about the ratio
between the steady state size and the theoretical fully packed size?

Cheers,

Jeff



Since we dump production DB daily into staging
environment, the difference in size (as reported by psql's \l+) is
11GB in a freshly restored DB as opposed to 70GB in production.


Yeah, that seems like a problem.  Do you have long lived 
transactions/snapshots that are preventing vacuuming from removing 
dead tuples?  You can run a manual "vacuum verbose" and see how many 
dead but nonremovable tuples there were, or 
set log_autovacuum_min_duration to some non-negative value less than 
the autovac takes, and do the same.


vacuum frees tuples just fine. It's just that by the time each run 
finishes many more accumulate due to table update activity, ad nauseum. 
So this unused space constantly grows. Here's a sample autovacuum run:


2019-04-11 19:39:44.450841500 [] LOG:  automatic vacuum of table 
"foo.public.bar": index scans: 1
2019-04-11 19:39:44.450843500   pages: 0 removed, 472095 remain, 4 
skipped due to pins, 39075 skipped frozen
2019-04-11 19:39:44.450844500   tuples: 19150 removed, 2725811 remain, 
465 are dead but not yet removable
2019-04-11 19:39:44.450845500   buffer usage: 62407557 hits, 6984769 
misses, 116409 dirtied
2019-04-11 19:39:44.450846500   avg read rate: 16.263 MB/s, avg write 
rate: 0.271 MB/s
2019-04-11 19:39:44.450847500   system usage: CPU 59.05s/115.26u sec 
elapsed 3355.28 sec



(Indeed, those dumps you take daily might be the source of those 
long-lived snapshots.  How long does a dump take?)


The daily dumps are taken daily from the slave server as part of stock 
FreeBSD postgres port activity.

I don't think it impacts the master server.



Also, what does pg_freespace 
(https://www.postgresql.org/docs/current/pgfreespacemap.html) show 
about the available of space in the table?  How about pgstattuple 
(https://www.postgresql.org/docs/current/pgstattuple.html)




Thanks, I'll try those. But as I said freshly restored DB is only 11GB 
in size, not 70 (only public schema is used).



Cheers,

Jeff





Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 07:40 PM, Jeff Janes wrote:
On Thu, Apr 11, 2019 at 10:28 AM rihad <mailto:ri...@mail.ru>> wrote:



Yup, it's just that n_dead_tuples grows by several hundred
thousand (the
table sees much much more updates than inserts) and disk usage grows
constantly between several hour long vacuum runs. Running vacuum full
isn't an option.


The disk usage doesn't reach a steady state after one or two 
autovacs?  Or it does, but you are just unhappy about the ratio 
between the steady state size and the theoretical fully packed size?


Cheers,

Jeff



Since we dump production DB daily into staging environment, the 
difference in size (as reported by psql's \l+) is 11GB in a freshly 
restored DB as opposed to 70GB in production.




Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 07:04 PM, Alvaro Herrera wrote:

On 2019-Apr-11, rihad wrote:


On 04/11/2019 06:41 PM, Alvaro Herrera wrote:


Perhaps it'd be better to vacuum this table much more often.


Each run took 5-6 hours, now it takes 2-3 hours after I've tweaked some
cost-based vacuum knobs.

But how often does it run?

One after another. Enough n_dead_tuples accumulate in between runs to 
easily trigger that.


autovacuum_vacuum_scale_factor = 0.01

utovacuum_vacuum_threshold = 50

which means to run autovac when 1% of table size + 50 rows have been 
updated or deleted.


But we can't make each autovacuum run run faster )

Currently I lowered this from 20ms:

autovacuum_vacuum_cost_delay = 10ms

And increased this from 200:

autovacuum_vacuum_cost_limit = 400

to make it finish in 1 hour rather than 3 hours.

Anything more than that and we risk impacting the performance of user 
queries.






Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 06:41 PM, Alvaro Herrera wrote:

On 2019-Apr-11, rihad wrote:


On 04/11/2019 06:20 PM, Tom Lane wrote:

rihad  writes:

Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space
would be available for reuse only at the end of the vacuum?

It's six bytes per dead tuple, last I checked ... you do the math.


Are there
any downsides in decreasing it to, say, 64MB? I see only pluses )

Well, usually people prefer to minimize the number of passes over
the indexes.

Yup, it's just that n_dead_tuples grows by several hundred thousand (the
table sees much much more updates than inserts) and disk usage grows
constantly between several hour long vacuum runs. Running vacuum full isn't
an option.

Perhaps it'd be better to vacuum this table much more often.

Each run took 5-6 hours, now it takes 2-3 hours after I've tweaked some 
cost-based vacuum knobs.






Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 06:20 PM, Tom Lane wrote:

rihad  writes:

Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space
would be available for reuse only at the end of the vacuum?

It's six bytes per dead tuple, last I checked ... you do the math.


Are there
any downsides in decreasing it to, say, 64MB? I see only pluses )

Well, usually people prefer to minimize the number of passes over
the indexes.

regards, tom lane
.

Yup, it's just that n_dead_tuples grows by several hundred thousand (the 
table sees much much more updates than inserts) and disk usage grows 
constantly between several hour long vacuum runs. Running vacuum full 
isn't an option.






Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 06:09 PM, Alvaro Herrera wrote:

On 2019-Apr-11, rihad wrote:


Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space
would be available for reuse only at the end of the vacuum? Are there any
downsides in decreasing it to, say, 64MB? I see only pluses )

Yes, each vacuum will take longer and will use much more I/O.


I see, thanks.





Re: When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad

On 04/11/2019 05:48 PM, Tom Lane wrote:

rihad  writes:

If an autovacuum job on a huge table runs for 5-6 hours, do its freed
pages/tuples become available for reuse immediately when they are marked
as free, or only at the end of the multi-hour vacuum operation?

They'll be freed in batches, where the size of a batch depends on the
autovacuum_work_mem or maintenance_work_mem setting.  The basic
work cycle is

* scan table to find dead tuples, save their TIDs in working memory;
   continue until end of table or working memory full
* scan indexes to find index entries matching those TIDs, remove 'em
* go back to table and remove the previously-found tuples
* if not end of table, repeat

So a larger work-mem setting means fewer passes over the indexes,
but a longer time until space is reclaimed.

regards, tom lane
.

Thanks! Our autovacuum_work_mem = 1GB, so this probably means any space 
would be available for reuse only at the end of the vacuum? Are there 
any downsides in decreasing it to, say, 64MB? I see only pluses )






When do vacuumed pages/tuples become available for reuse?

2019-04-11 Thread rihad
If an autovacuum job on a huge table runs for 5-6 hours, do its freed 
pages/tuples become available for reuse immediately when they are marked 
as free, or only at the end of the multi-hour vacuum operation?






SIGTERM/SIGINT master/slave behavior

2019-04-06 Thread rihad
Hi. When master server receives smart shutdown request (TERM) does it 
exit after making sure it sends all received writes to the slave 
server(s), or it exits leaving the slave in an inconsistent state? What 
about during fast shutdown (SIGINT)? I know that it asks current 
requests to terminate immediately in that case, but what about the 
replication connection? Is it safe to send SIGINT when intending to do a 
manual failover?



https://www.postgresql.org/docs/10/server-shutdown.html


Thanks.





Re: Recommendation to run vacuum FULL in parallel

2019-04-03 Thread rihad

And future updates can reuse it, too (an update is very similar to an
insert+delete).



Hm, then it's strange our DB takes 6 times as much space compared to 
freshly restored one (only public schema is considered).



Not if autovacuum has a chance to run between updates.


Ours is run regularly, although we had to tweak it down not to interfere 
with normal database activity, so it takes several hours each run on the 
table. We did that by setting autovacuum_vacuum_scale_factor = 0.05 from 
default 0.2.





Re: Recommendation to run vacuum FULL in parallel

2019-04-03 Thread rihad

On 04/03/2019 06:40 PM, Michael Lewis wrote:

"Sometimes a table's usage pattern involves much more updates than
inserts, which gradually uses more and more unused space that is never
used again by postgres, and plain autovacuuming doesn't return it to the
OS."

Can you expound on that? I thought that was exactly what autovacuum 
did for old versions of rows whether dead because of delete or update, 
so I am surprised by this statement. I thought vacuum full was only 
ever needed if storage space is an issue and the table is not expect 
to quickly re-expand to current size on disk from new churn of tuples.



From what I understand from the docs updates keep older versions of 
rows intact because other transactions might still use them (this is the 
essence of MVCC), and autovacuuming (plain VACUUM) marks that space as 
available when it is run, so future inserts can reuse it. In case the 
number of updates is much greater than the number of inserts, the unused 
zombie space gradually creeps up.






Re: Recommendation to run vacuum FULL in parallel

2019-04-03 Thread rihad
Does *every* table have *so much* free space that it's impractical to 
just

let the files just get refilled by normal usage?


Ideally VACUUM FULL should not require a giant lock on the table.

Sometimes a table's usage pattern involves much more updates than 
inserts, which gradually uses more and more unused space that is never 
used again by postgres, and plain autovacuuming doesn't return it to the 
OS. So DB size (as witnessed by psql's \l+) uses 5-6x times the space it 
actually needs. And using vacuum full is prohibitive because of the 
exclusive lock it takes on the table, preventing both writes and reads. 
Since rewriting a table is a completely internal operation from clients' 
POV, hopefully one day we will see a concurrent version of vacuum full.






Re: Gigantic load average spikes

2019-04-01 Thread rihad

On 04/01/2019 08:30 PM, Michel Pelletier wrote:



On Sun, Mar 31, 2019 at 10:49 PM David Rowley 
mailto:david.row...@2ndquadrant.com>> 
wrote:



Perhaps a bunch of processes waiting on the access exclusive lock on
the materialized view being released?

log_lock_waits might help you if the MV takes more than a second to
refresh, otherwise, you might need to have a look at ungranted locks
in pg_locks and see if the number of locks spikes during the refresh.


I think David's got the right idea here.  Like he said, investigate 
pg_locks, if it is the refresh materialized view, you can avoid the 
problem by doing 'REFRESH MATERIALIZED VIEW CONCURRENTLY'.  You will 
need at least one unique index on the table.


It is actually refreshed concurrently.




-- 
 David Rowley http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services






Re: Gigantic load average spikes

2019-04-01 Thread rihad

On 04/01/2019 06:17 PM, Adrian Klaver wrote:

On 3/31/19 10:08 PM, rihad wrote:
What exactly do you mean by "running processes"? I don't think I've 
ever

seen a Unix with only 1 to 3 running processes in total, so you are
probably referring to processes in a certain state. Runnable (R)?
Uninterruptible sleep (D)? Both? Something else?


Just that, 250-300 running processes in top shown for a second. 
250-300 is the number of postgres worker processes used, but normally 
only 1-3 of them are running according to top. At times of load 
FreeBSD (?) 


So what are the process titles?


postgres





Re: Gigantic load average spikes

2019-03-31 Thread rihad

What exactly do you mean by "running processes"? I don't think I've ever
seen a Unix with only 1 to 3 running processes in total, so you are
probably referring to processes in a certain state. Runnable (R)?
Uninterruptible sleep (D)? Both? Something else?


Just that, 250-300 running processes in top shown for a second. 250-300 
is the number of postgres worker processes used, but normally only 1-3 
of them are running according to top. At times of load FreeBSD (?) 
schedules all of them to run. This doesn't really put the machine on its 
knees, it just impacts load avg.






Gigantic load average spikes

2019-03-31 Thread rihad
Postgres 10.3 On freebsd 10.3 is almost idle, disk i/o about 5-10%, 
number running processes about 1-3, cpu about 90% idle, then we run a 
i/o heavy job like "refresh materialized view", cpu & disk i/o still not 
maxed out, all of a sudden the number of running processes increases to 
about 250-300 for a second, which increases load averages that we 
periodically poll to see if everything's alright. Why is that? Disk i/o 
is far from being maxed out, the actual number of running processes is 
really small.






CREATE COLLATION to match pg_collation data

2019-01-11 Thread rihad
Hi, since pg_import_system_collations() that would solve all this 
doesn't exist in 9.6.x, is there a way to I properly map it to a CREATE 
COLLATE call? Specifically I need to have this collation (from 10.x) in 
9.6.x so it works on FreeBSD:



    collname    | collnamespace | collowner | collprovider | 
collencoding |    collcollate    | collctype | collversion

+---+---+--+--+---+---+-

 az-x-icu   |    11 |    10 | i    
|   -1 | az    | az    | 153.88.34




CREATE COLLATIONname  (
[ LOCALE =locale, ]
[ LC_COLLATE =lc_collate, ]
[ LC_CTYPE =lc_ctype  ]
)
CREATE COLLATIONname  FROMexisting_collation 
https://www.postgresql.org/docs/9.6/sql-createcollation.html Can I just 
not use "create collation", and simply insert the new row as is? Thanks 
for any tips.




Re: Adding new collations after pg_upgrade?

2019-01-06 Thread rihad

On 01/06/2019 08:05 PM, rihad wrote:

On 01/06/2019 07:57 PM, rihad wrote:

On 01/06/2019 07:51 PM, Tom Lane wrote:

rihad  writes:

Hi. on a freshly installed 10.6 I can see all ICU collations added in
pg_collation schema. Is there a way to have them for an existing
database cluster?

There's a function called something like pg_import_system_collations.
See documentation.


Thanks a lot!

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations
-
 798
(1 row)

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations
-
   0
(1 row)


Oops, I meant schema pg_collation )


Phew, sorry for my confusion. Schema pg_catalog. Table pg_collation. So 
I did it right )







    regards, tom lane
.










Re: Adding new collations after pg_upgrade?

2019-01-06 Thread rihad

On 01/06/2019 07:57 PM, rihad wrote:

On 01/06/2019 07:51 PM, Tom Lane wrote:

rihad  writes:

Hi. on a freshly installed 10.6 I can see all ICU collations added in
pg_collation schema. Is there a way to have them for an existing
database cluster?

There's a function called something like pg_import_system_collations.
See documentation.


Thanks a lot!

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations
-
 798
(1 row)

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations
-
   0
(1 row)


Oops, I meant schema pg_collation )





    regards, tom lane
.








Re: Adding new collations after pg_upgrade?

2019-01-06 Thread rihad

On 01/06/2019 07:51 PM, Tom Lane wrote:

rihad  writes:

Hi. on a freshly installed 10.6 I can see all ICU collations added in
pg_collation schema. Is there a way to have them for an existing
database cluster?

There's a function called something like pg_import_system_collations.
See documentation.


Thanks a lot!

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations
-
 798
(1 row)

postgres=# select pg_import_system_collations('pg_catalog');
 pg_import_system_collations
-
   0
(1 row)




regards, tom lane
.






Adding new collations after pg_upgrade?

2019-01-06 Thread rihad
Hi. on a freshly installed 10.6 I can see all ICU collations added in 
pg_collation schema. Is there a way to have them for an existing 
database cluster?





Re: ERROR: found xmin 54230249 from before relfrozenxid 61349053

2018-08-02 Thread rihad

On 08/02/2018 05:34 PM, rihad wrote:

On 08/02/2018 04:38 PM, rihad wrote:
Hi all. We see such errors on one of our own (non-system) tables on 
Postgres 10.3. They stop both automatic vacuum & automatic analyze 
not only on that table, but on all tables & databases (manual 
vacuuming works though). Luckily it's a small (but important) table - 
less than 1000 rows. Any quick way of fixing it with no downtime? 
Thanks.




BTW, it's a materialized view, not a table. "refresh materialized view 
concurrently" is being run on it regularly, but apparently doesn't fix 
the problem.




Answering to myself: this probably relates to this issue:

https://www.postgresql.org/docs/9.4/static/release-9-4-17.html

 *

   Repair pg_upgrade's failure to preserve relfrozenxid for
   materialized views (Tom Lane, Andres Freund)

   This oversight could lead to data corruption in materialized views
   after an upgrade, manifesting as "could not access status of
   transaction" or "found xmin from before relfrozenxid" errors. The
   problem would be more likely to occur in seldom-refreshed
   materialized views, or ones that were maintained only with REFRESH
   MATERIALIZED VIEW CONCURRENTLY.

   If such corruption is observed, it can be repaired by refreshing the
   materialized view (without CONCURRENTLY).


Fixed on 2018-03-01, so the bug existed not only for 9.4.17, but for all 
major versions too.




Re: ERROR: found xmin 54230249 from before relfrozenxid 61349053

2018-08-02 Thread rihad

On 08/02/2018 04:38 PM, rihad wrote:
Hi all. We see such errors on one of our own (non-system) tables on 
Postgres 10.3. They stop both automatic vacuum & automatic analyze not 
only on that table, but on all tables & databases (manual vacuuming 
works though). Luckily it's a small (but important) table - less than 
1000 rows. Any quick way of fixing it with no downtime? Thanks.




BTW, it's a materialized view, not a table. "refresh materialized view 
concurrently" is being run on it regularly, but apparently doesn't fix 
the problem.






ERROR: found xmin 54230249 from before relfrozenxid 61349053

2018-08-02 Thread rihad
Hi all. We see such errors on one of our own (non-system) tables on 
Postgres 10.3. They stop both automatic vacuum & automatic analyze not 
only on that table, but on all tables & databases (manual vacuuming 
works though). Luckily it's a small (but important) table - less than 
1000 rows. Any quick way of fixing it with no downtime? Thanks.





Re: pg_upgrading to 10.1 corrupts (varchar,varchar) UNIQUE indices

2017-12-22 Thread rihad

On 12/22/2017 07:43 PM, Peter Geoghegan wrote:
I have no idea. Maybe something to do with not actually using the same 
collation provider?


The freebsd package contains modifications to make icu work, or at 
least did. Possibly, you just stopped using the same collation 
provider, without even realizing it, because they switched over to a 
new provider without accounting for pg_upgrade. If so, then that's a 
bug in the package. This is a total speculation, but makes a certain 
amount of sense to me.


Exactly!

9.6.6 port options include ICU:
OPTIONS_FILE_UNSET+=DEBUG
OPTIONS_FILE_UNSET+=DTRACE
OPTIONS_FILE_UNSET+=GSSAPI
OPTIONS_FILE_SET+=ICU
OPTIONS_FILE_SET+=INTDATE
OPTIONS_FILE_UNSET+=LDAP
OPTIONS_FILE_UNSET+=NLS
OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
OPTIONS_FILE_UNSET+=PAM
OPTIONS_FILE_UNSET+=SSL
OPTIONS_FILE_SET+=TZDATA
OPTIONS_FILE_SET+=XML
OPTIONS_FILE_UNSET+=MIT_KRB5
OPTIONS_FILE_UNSET+=HEIMDAL_KRB5

10.1 options don't:
OPTIONS_FILE_UNSET+=DEBUG
OPTIONS_FILE_UNSET+=DTRACE
OPTIONS_FILE_UNSET+=GSSAPI
OPTIONS_FILE_SET+=INTDATE
OPTIONS_FILE_UNSET+=LDAP
OPTIONS_FILE_UNSET+=NLS
OPTIONS_FILE_UNSET+=OPTIMIZED_CFLAGS
OPTIONS_FILE_UNSET+=PAM
OPTIONS_FILE_UNSET+=SSL
OPTIONS_FILE_SET+=TZDATA
OPTIONS_FILE_SET+=XML

Thank you and others who have responded so much for your tips!



Re: pg_upgrading to 10.1 corrupts (varchar,varchar) UNIQUE indices

2017-12-22 Thread rihad
I wonder why the errors don't show up under 9.6.6 and only started 
appearing after the pg_upgrade to 10.1?

Both severs naturally used the same OS environment.



Re: pg_upgrading to 10.1 corrupts (varchar,varchar) UNIQUE indices

2017-12-22 Thread rihad

On 12/22/2017 07:01 PM, rihad wrote:

[snip]
foo=# select bt_index_check('index_users_on_email_and_type');
ERROR:  item order invariant violated for index 
"index_users_on_email_and_type"
DETAIL:  Lower index tid=(3,25) (points to index tid=(26,1)) higher 
index tid=(3,26) (points to index tid=(27,1)) page lsn=0/0.




Interestingly, another (varchar,varchar) index on a completely different 
table doesn't have the problem because it probably doesn't contain any 
non-ascii symbols.



 gateway_reference | character varying(255)  | |  |
 gateway_name  | character varying   | |  |

foo=# select 
bt_index_check('index_transactions_on_gateway_name_and_gateway_reference');

 bt_index_check


(1 row)





Re: pg_upgrading to 10.1 corrupts (varchar,varchar) UNIQUE indices

2017-12-22 Thread rihad

On 12/22/2017 05:32 PM, Peter Geoghegan wrote:

On Fri, Dec 22, 2017 at 11:56 AM, rihad <ri...@mail.ru> wrote:

I forgot to mention the real problem: the mentioned unique constraint didn't
work and allowed duplicate rows to get inserted into the table until the
duplicates were manually removed the the index was rebuilt.

You should run amcheck functions on both environments, against all
indexes, to see where the inconsistency arose, and to isolate any
other inconsistencies that may have been missed. While amcheck is
available from contrib in Postgres 10, you can get a version that will
work on other versions through OS packages for most major Linux
distributions. See:

https://github.com/petergeoghegan/amcheck

Note also that only this external version has the "heapallindexed" check.

Hm, interesting. It doesn't look like FreeBSD ports include the amcheck 
extension, but I could still use the versions bundled with postgres 
10.1-contrib.


Also, the version included doesn't allow a second boolean argument.

I first ran it on a reindexed index, which didn't show any problems, as 
expected.


Then I ran it on an unfixed broken index.

foo=# create extension amcheck;
CREATE EXTENSION

foo=# select bt_index_check('index_translations_on_locale_and_key');
 bt_index_check


(1 row)

foo=# select bt_index_check('index_users_on_email_and_type');
ERROR:  item order invariant violated for index 
"index_users_on_email_and_type"
DETAIL:  Lower index tid=(3,25) (points to index tid=(26,1)) higher 
index tid=(3,26) (points to index tid=(27,1)) page lsn=0/0.





Re: pg_upgrading to 10.1 corrupts (varchar,varchar) UNIQUE indices

2017-12-22 Thread rihad

On 12/22/2017 05:09 PM, Laurenz Albe wrote:

rihad wrote:

Hi there, this is a reproducible error. We recently pg_upgraded our
production database to 10.1 from 9.6.6. The upgrade runs fine with the
suggestion to analyze all data.

[...]


Trying to find data using the specified indices fails to find matching rows:

[...]


reindexing the table fixes the issue:

Did you also upgrade the operating system or at least libc?

It looks like it.

$ ll /lib/libc.so.7
-r--r--r--  1 root  wheel  1647104 Jun 19  2017 /lib/libc.so.7

this never was a major/minor OS upgrade, just whatever FreeBSD patch 
releases included.




Then the problem could come from a modified collation.
Compare 
https://www.postgresql.org/message-id/CB4D1C6BAA80CF146CB0D4F2%40eje.credativ.lan


de (German) was just an example, the inability to find matching rows 
affected other languages like fr or even en, too.

It does look like a collation issue, though.



pg_upgrading to 10.1 corrupts (varchar,varchar) UNIQUE indices

2017-12-22 Thread rihad
I forgot to mention the real problem: the mentioned unique constraint 
didn't work and allowed duplicate rows to get inserted into the table 
until the duplicates were manually removed the the index was rebuilt.