Re: Performance impact of updating target columns with unchanged values ON CONFLICT

2018-11-23 Thread Abi Noda
I take that question back – someone helped me on StackExchange and
addressed it:

*> It appears that Postgres is smart enough to identify cases where indexed
columns are not changed , and perform HOT updates; thus , there is no
difference between having or not having key columns in update statement
from performance point of view. The only thing that matters it whether
actual value changed. Surely, this behaviour is limited to B-Tree indexes. *

https://dba.stackexchange.com/questions/223231/performance-impact-of-updating-target-columns-with-same-values-on-conflict

On Fri, Nov 23, 2018 at 7:44 PM Abi Noda  wrote:

> Thanks Justin. Do you know if Postgres treats an UPDATE that sets the
> indexed columns set to the same previous values as a change? Or does it
> only count it as "changed" if the values are different. This is ambiguous
> to me.
>
> *> HOT solves this problem for a restricted but useful special case where
> a tuple is repeatedly updated in ways that do not change its indexed
> columns.*
>
> *> With HOT, a new tuple placed on the same page and with all indexed
> columns the same as its parent row version does not get new index entries.*
>
> *> [HOT] will create a new physical heap tuple when inserting, and not a
> new index tuple, if and only if the update did not affect indexed columns.*
>
>
>
> On Thu, Nov 22, 2018 at 2:40 PM Justin Pryzby 
> wrote:
>
>> On Thu, Nov 22, 2018 at 01:31:10PM -0800, Abi Noda wrote:
>> > In other words, is Postgres smart enough to not actually write to disk
>> any
>> > columns that haven’t changed value or update indexes based on those
>> columns?
>>
>> You're asking about what's referred to as Heap only tuples:
>>
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD
>> https://wiki.postgresql.org/wiki/Index-only_scans#Interaction_with_HOT
>>
>> Note, if you're doing alot of updates, you should consider setting a
>> lower the
>> table fillfactor, since HOT is only possible if the new tuple (row
>> version) is
>> on the same page as the old tuple.
>>
>> |With HOT, a new tuple placed on the same page and with all indexed
>> columns the
>> |same as its parent row version does not get new index entries."
>>
>> And check pg_stat_user_tables to verify that's working as intended.
>>
>> Justin
>>
>


Re: Performance impact of updating target columns with unchanged values ON CONFLICT

2018-11-23 Thread Abi Noda
Thanks Justin. Do you know if Postgres treats an UPDATE that sets the
indexed columns set to the same previous values as a change? Or does it
only count it as "changed" if the values are different. This is ambiguous
to me.

*> HOT solves this problem for a restricted but useful special case where a
tuple is repeatedly updated in ways that do not change its indexed columns.*

*> With HOT, a new tuple placed on the same page and with all indexed
columns the same as its parent row version does not get new index entries.*

*> [HOT] will create a new physical heap tuple when inserting, and not a
new index tuple, if and only if the update did not affect indexed columns.*



On Thu, Nov 22, 2018 at 2:40 PM Justin Pryzby  wrote:

> On Thu, Nov 22, 2018 at 01:31:10PM -0800, Abi Noda wrote:
> > In other words, is Postgres smart enough to not actually write to disk
> any
> > columns that haven’t changed value or update indexes based on those
> columns?
>
> You're asking about what's referred to as Heap only tuples:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD
> https://wiki.postgresql.org/wiki/Index-only_scans#Interaction_with_HOT
>
> Note, if you're doing alot of updates, you should consider setting a lower
> the
> table fillfactor, since HOT is only possible if the new tuple (row
> version) is
> on the same page as the old tuple.
>
> |With HOT, a new tuple placed on the same page and with all indexed
> columns the
> |same as its parent row version does not get new index entries."
>
> And check pg_stat_user_tables to verify that's working as intended.
>
> Justin
>


Re: dsa_allocate() faliure

2018-11-23 Thread Justin Pryzby
On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
> Hi Justin, I've upgrade to 10.6 but the error still shows up:
> 
> If I set it to max_parallel_workers=0 I also get and my connection is being
> closed (but the server is alive):
> 
> psql db@host as user => set max_parallel_workers=0;

Can you show the plan (explain without analyze) for the nonparallel case?

Also, it looks like the server crashed in that case (even if it restarted
itself quickly).  Can you confirm ?

For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
similar.  And other clients would've been disconnected.  (For example, you'd
get an error in another, previously-connected session the next time you run:
SELECT 1).

In any case, could you try to find a minimal way to reproduce the problem ?  I
mean, is the dataset and query small and something you can publish, or can you
reproduce with data generated from (for example) generate_series() ?

Thanks,
Justin



Re: dsa_allocate() faliure

2018-11-23 Thread Jakub Glapa
Hi Justin, I've upgrade to 10.6 but the error still shows up:

psql db@host as user => select version();
   version

─
 PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)

Time: 110.512 ms

psql db@host as user  => select  from fa where client_id in
() and datetime >= '2018/01/01' and ((dims ? 'p' and dimensions ?
'mcp') or (datasource in ('FA', 'GA'))) and not datasource = 'M' GROUP BY
datasource, dims ->'ct', dimensions ->'mct', dims -> 'p', dims -> 'sp';
ERROR:  XX000: dsa_allocate could not find 7 free pages
CONTEXT:  parallel worker
LOCATION:  dsa_allocate_extended, dsa.c:729
Time: 131400.831 ms (02:11.401)

the above is execute with max_parallel_workers=8
If I set it to max_parallel_workers=0 I also get and my connection is being
closed (but the server is alive):

psql db@host as user => set max_parallel_workers=0;
SET
Time: 89.542 ms
psql db@host as user => SELECT ;
FATAL:  XX000: dsa_allocate could not find 7 free pages
LOCATION:  dsa_allocate_extended, dsa.c:729
SSL connection has been closed unexpectedly
The connection to the server was lost. Attempting reset: Succeeded.
Time: 200390.466 ms (03:20.390)



--
regards,
Jakub Glapa


On Thu, Nov 22, 2018 at 5:10 PM Justin Pryzby  wrote:

> On Wed, Nov 21, 2018 at 03:26:42PM +0100, Jakub Glapa wrote:
> > Looks like my email didn't match the right thread:
> >
> https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com
> > Any chance to get some feedback on this?
>
> In the related thread, it looks like Thomas backpatched a fix to v10, and
> so I
> guess this should be resolved in 10.6, which was released couple weeks ago.
>
> https://www.postgresql.org/message-id/CAEepm%3D0QxoUSkFqYbvmxi2eNvvU6BkqH6fTOu4oOzc1MRAT4Dw%40mail.gmail.com
>
> Could you upgrade and check ?
>
> 38763d67784c6563d08dbea5c9f913fa174779b8 in master
>
> |commit ba20d392584cdecc2808fe936448d127f43f2c07
> |Author: Thomas Munro 
> |Date:   Thu Sep 20 15:52:39 2018 +1200
> |
> |Fix segment_bins corruption in dsa.c.
>
> Justin
>