Re: Catching up with performance & PostgreSQL 15

2022-12-03 Thread Jeff Janes
On Tue, Nov 29, 2022 at 4:07 PM David Rowley  wrote:

> On Wed, 30 Nov 2022 at 03:31, Tom Lane  wrote:
> >
> > Alvaro Herrera  writes:
> > > IMO it was a mistake to turn JIT on in the default config, so that's
> one
> > > thing you'll likely want to change.
> >
> > I wouldn't necessarily go quite that far, but I do think that the
> > default cost thresholds for invoking it are enormously too low,
> > or else there are serious bugs in the cost-estimation algorithms
> > for deciding when to use it.  A nearby example[1] of a sub-1-sec
> > partitioned query that took 30sec after JIT was enabled makes me
> > wonder if we're accounting correctly for per-partition JIT costs.
>
> I'm very grateful for JIT. However, I do agree that the costs need to work.
>
> The problem is that the threshold to turn JIT on does not consider how
> many expressions need to be compiled. It's quite different to JIT
> compile a simple one-node plan with a total cost of 10 than to JIT
> compile a plan that  costs the same but queries 1000 partitions. I
> think we should be compiling expressions based on the cost of the
> individial node rather than the total cost of the plan.


I think a big win for JIT would be to be able to do it just once per cached
plan, not once per execution.  And then have it turned on only for prepared
statements.  Of course that means JIT couldn't do parameter folding, but I
don't know if it does that anyway.  Also, very expensive plans are
generally dominated by IO cost estimates, and I think it doesn't make sense
to drive JIT decisions based predominantly on the expected cost of the IO.
If the planner separated IO cost estimate totals from CPU cost estimate
totals, it might open up better choices.

Cheers,

Jeff


Re: Catching up with performance & PostgreSQL 15

2022-12-01 Thread Andrew Dunstan


On 2022-11-30 We 11:36, Tom Lane wrote:
> Andres Freund  writes:
>> On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  
>> wrote:
>>> I think Alvaro's point is that it would have been better to work out
>>> these wrinkles before turning on JIT by default. Based on anecdotal
>>> reports from the field I'm inclined to agree.
>> The problem is that back when it was introduced these problems didn't exist 
>> to a significant degree. JIT was developed when partitioning was very 
>> minimal- and the problems we're seeing are almost exclusively with queries 
>> with many partitions. The problems really only started much more recently. 
>> It also wasn't enabled in the first release..
> Well, wherever you want to pin the blame, it seems clear that we
> have a problem now.  And I don't think flipping back to off-by-default
> is the answer -- surely there is some population of users who will
> not be happy with that.  We really need to prioritize fixing the
> cost-estimation problems, and/or tweaking the default thresholds.
>
>   


+1


FTR I am not trying to pin blame anywhere. I think the work that's been
done on JIT is more than impressive.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Tom Lane
Andres Freund  writes:
> On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  
> wrote:
>> I think Alvaro's point is that it would have been better to work out
>> these wrinkles before turning on JIT by default. Based on anecdotal
>> reports from the field I'm inclined to agree.

> The problem is that back when it was introduced these problems didn't exist 
> to a significant degree. JIT was developed when partitioning was very 
> minimal- and the problems we're seeing are almost exclusively with queries 
> with many partitions. The problems really only started much more recently. It 
> also wasn't enabled in the first release..

Well, wherever you want to pin the blame, it seems clear that we
have a problem now.  And I don't think flipping back to off-by-default
is the answer -- surely there is some population of users who will
not be happy with that.  We really need to prioritize fixing the
cost-estimation problems, and/or tweaking the default thresholds.

regards, tom lane




Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Andres Freund
Hi, 

On November 30, 2022 3:47:32 AM PST, Andrew Dunstan  wrote:
>
>On 2022-11-29 Tu 16:06, David Rowley wrote:
>> On Wed, 30 Nov 2022 at 03:31, Tom Lane  wrote:
>>> Alvaro Herrera  writes:
 IMO it was a mistake to turn JIT on in the default config, so that's one
 thing you'll likely want to change.
>>> I wouldn't necessarily go quite that far, but I do think that the
>>> default cost thresholds for invoking it are enormously too low,
>>> or else there are serious bugs in the cost-estimation algorithms
>>> for deciding when to use it.  A nearby example[1] of a sub-1-sec
>>> partitioned query that took 30sec after JIT was enabled makes me
>>> wonder if we're accounting correctly for per-partition JIT costs.
>> I'm very grateful for JIT. However, I do agree that the costs need to work.
>>
>> The problem is that the threshold to turn JIT on does not consider how
>> many expressions need to be compiled. It's quite different to JIT
>> compile a simple one-node plan with a total cost of 10 than to JIT
>> compile a plan that  costs the same but queries 1000 partitions. I
>> think we should be compiling expressions based on the cost of the
>> individial node rather than the total cost of the plan. We need to
>> make some changes so we can more easily determine the number of times
>> a given node will be executed before we can determine how worthwhile
>> JITting an expression in a node will be.
>>
>
>I think Alvaro's point is that it would have been better to work out
>these wrinkles before turning on JIT by default. Based on anecdotal
>reports from the field I'm inclined to agree.

The problem is that back when it was introduced these problems didn't exist to 
a significant degree. JIT was developed when partitioning was very minimal- and 
the problems we're seeing are almost exclusively with queries with many 
partitions. The problems really only started much more recently. It also wasn't 
enabled in the first release..

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Catching up with performance & PostgreSQL 15

2022-11-30 Thread Andrew Dunstan


On 2022-11-29 Tu 16:06, David Rowley wrote:
> On Wed, 30 Nov 2022 at 03:31, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> IMO it was a mistake to turn JIT on in the default config, so that's one
>>> thing you'll likely want to change.
>> I wouldn't necessarily go quite that far, but I do think that the
>> default cost thresholds for invoking it are enormously too low,
>> or else there are serious bugs in the cost-estimation algorithms
>> for deciding when to use it.  A nearby example[1] of a sub-1-sec
>> partitioned query that took 30sec after JIT was enabled makes me
>> wonder if we're accounting correctly for per-partition JIT costs.
> I'm very grateful for JIT. However, I do agree that the costs need to work.
>
> The problem is that the threshold to turn JIT on does not consider how
> many expressions need to be compiled. It's quite different to JIT
> compile a simple one-node plan with a total cost of 10 than to JIT
> compile a plan that  costs the same but queries 1000 partitions. I
> think we should be compiling expressions based on the cost of the
> individial node rather than the total cost of the plan. We need to
> make some changes so we can more easily determine the number of times
> a given node will be executed before we can determine how worthwhile
> JITting an expression in a node will be.
>

I think Alvaro's point is that it would have been better to work out
these wrinkles before turning on JIT by default. Based on anecdotal
reports from the field I'm inclined to agree.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Mladen Gogala
On Tue, 2022-11-29 at 19:09 +0100, Alvaro Herrera wrote:
> On 2022-Nov-29, Mladen Gogala wrote:
> 
> > Hmmm, I think I will run pgbench with and without JIT on and see
> > the
> > difference.
> 
> I doubt you'll notice anything, because the pgbench queries will be
> far
> below the JIT cost, so nothing will get JIT compiled at all.  Or are
> you
> planning on using a custom set of queries?
> 

Nope. I am planning to set jit_above_cost parameter to 5. That should
take care of the pgbench problem. Other than that, you're right: JIT
should not be used for OLTP. However, pure OLTP or DW databases are a
rarity these days. Reporting is a crucial function and almost every
OLTP database that I've seen also has reporting function, which means
that there are complex queries to be executed.
-- 
Mladen Gogala
Database Consultant
https://dbwhisperer.wordpress.com



Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread David Rowley
On Wed, 30 Nov 2022 at 03:31, Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > IMO it was a mistake to turn JIT on in the default config, so that's one
> > thing you'll likely want to change.
>
> I wouldn't necessarily go quite that far, but I do think that the
> default cost thresholds for invoking it are enormously too low,
> or else there are serious bugs in the cost-estimation algorithms
> for deciding when to use it.  A nearby example[1] of a sub-1-sec
> partitioned query that took 30sec after JIT was enabled makes me
> wonder if we're accounting correctly for per-partition JIT costs.

I'm very grateful for JIT. However, I do agree that the costs need to work.

The problem is that the threshold to turn JIT on does not consider how
many expressions need to be compiled. It's quite different to JIT
compile a simple one-node plan with a total cost of 10 than to JIT
compile a plan that  costs the same but queries 1000 partitions. I
think we should be compiling expressions based on the cost of the
individial node rather than the total cost of the plan. We need to
make some changes so we can more easily determine the number of times
a given node will be executed before we can determine how worthwhile
JITting an expression in a node will be.

David

> [1] 
> https://www.postgresql.org/message-id/B6025887-D73F-4B5B-9925-4DA4B675F7E5%40elevated-dev.com




Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Alvaro Herrera
On 2022-Nov-29, Mladen Gogala wrote:

> Hmmm, I think I will run pgbench with and without JIT on and see the
> difference.

I doubt you'll notice anything, because the pgbench queries will be far
below the JIT cost, so nothing will get JIT compiled at all.  Or are you
planning on using a custom set of queries?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Tom Lane
Alvaro Herrera  writes:
> IMO it was a mistake to turn JIT on in the default config, so that's one
> thing you'll likely want to change.

I wouldn't necessarily go quite that far, but I do think that the
default cost thresholds for invoking it are enormously too low,
or else there are serious bugs in the cost-estimation algorithms
for deciding when to use it.  A nearby example[1] of a sub-1-sec
partitioned query that took 30sec after JIT was enabled makes me
wonder if we're accounting correctly for per-partition JIT costs.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/B6025887-D73F-4B5B-9925-4DA4B675F7E5%40elevated-dev.com




Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Mladen Gogala

On 11/29/22 03:36, Alvaro Herrera wrote:

On 2022-Nov-28, Mladen Gogala wrote:


As for JIT, I've recently asked that question myself. I was told that
PostgreSQL with LLVM enabled performs approximately 25% better than without
it.

Hmm, actually, normally you're better off turning JIT off, because it's
very common to diagnose cases of queries that become much, much slower
because of it.  Some queries do become faster, but it's not a wide
margin, and it's not a lot.  There are rare cases where JIT is
beneficial, but those tend to be queries that take upwards of several
seconds already.

IMO it was a mistake to turn JIT on in the default config, so that's one
thing you'll likely want to change.

Hmmm, I think I will run pgbench with and without JIT on and see the 
difference.


--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Mladen Gogala

On 11/29/22 03:31, Alvaro Herrera wrote:

On 2022-Nov-28, Mladen Gogala wrote:


You'll probably be glad to learn that we have hints now.

What hints are you talking about?  As I understand, we still don't have
Oracle-style query hints.


https://github.com/ossc-db/pg_hint_plan

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Alvaro Herrera
On 2022-Nov-28, Mladen Gogala wrote:

> As for JIT, I've recently asked that question myself. I was told that
> PostgreSQL with LLVM enabled performs approximately 25% better than without
> it.

Hmm, actually, normally you're better off turning JIT off, because it's
very common to diagnose cases of queries that become much, much slower
because of it.  Some queries do become faster, but it's not a wide
margin, and it's not a lot.  There are rare cases where JIT is
beneficial, but those tend to be queries that take upwards of several
seconds already.

IMO it was a mistake to turn JIT on in the default config, so that's one
thing you'll likely want to change.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)




Re: Catching up with performance & PostgreSQL 15

2022-11-29 Thread Alvaro Herrera
On 2022-Nov-28, Mladen Gogala wrote:

> You'll probably be glad to learn that we have hints now.

What hints are you talking about?  As I understand, we still don't have
Oracle-style query hints.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Catching up with performance & PostgreSQL 15

2022-11-28 Thread Josh Berkus

On 11/28/22 19:34, Justin Pryzby wrote:

In addition to setting work_mem, you can also (since v13) set
hash_mem_multiplier.


Is there any guidance on setting this?  Or is it still "use the default 
unless you can play around with it"?



default_toast_compression = lz4 # v14
recovery_init_sync_method = syncfs # v14
check_client_connection_interval = ... # v14
wal_compression = {lz4,zstd} # v15


If anyone has links to blogs or other things that discuss the 
performance implications of the above settings that would be wonderful!


--
Josh Berkus





Re: Catching up with performance & PostgreSQL 15

2022-11-28 Thread Mladen Gogala

On 11/28/22 21:59, Josh Berkus wrote:

Hey, folks:

I haven't configured a PostgreSQL server since version 11 (before 
that, I did quite a few).


What's changed in terms of performance configuration since then? Have 
the fundamentals of shared_buffers/work_mem/max_connections changed at 
all?  Which new settings are must-tunes?


I've heard about new parallel stuff an JIT, but neither is that 
applicable to my use-case.


Well, well! Long time no see! You'll probably be glad to learn that we 
have hints now. Thank you for the following page you created:


https://laptrinhx.com/why-postgresql-doesn-t-have-query-hints-2912445911/

I've used it several times, with great success. It's priceless.

Now, to answer your question: no, fundamentals of shared buffers, work 
memory and connections haven't changed. Parallelism works fine, it's 
reliable and easy to enable. All you need is to set 
max_parallel_workers_per_gather to an integer > 0 and PgSQL 15 will 
automatically use parallel plan if the planner decides that it's the 
best path. However, to warn you in advance, parallel query is not a 
panacea. On OLTP databases, I usually disable it on purpose. Parallel 
query will speed up sequential scans, but if your application is OLTP, 
sequential scan is a sign of trouble. Parallelism is a data warehouse 
only feature. And even then, you don't want it ti be run by multiple 
users at the same time. Namely, the number of your CPU resources is 
finite and having multiple users launch multiple processes is the best 
way to run out of the CPU power fast. Normally, you would package an 
output of the parallel query into a materialized view and let the users 
query the view.


As for JIT, I've recently asked that question myself. I was told that 
PostgreSQL with LLVM enabled performs approximately 25% better than 
without it. I haven't measured it so I can't  either confirm or deny the 
number.  I can tell you that there is a noticeable throughput 
improvement with PL/PGSQL intensive applications. There was also an 
increase in CPU consumption. I wasn't doing benchmarks, I was looking 
for a generic settings to install via Ansible so I don't have the 
numbers, only the feeling. One way of quantifying the difference would 
be to run pgbench with and without JIT.


PS:

I am still an Oracle DBA, just as you wrote in the paper.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com


Re: Catching up with performance & PostgreSQL 15

2022-11-28 Thread Justin Pryzby
On Mon, Nov 28, 2022 at 06:59:41PM -0800, Josh Berkus wrote:
> Hey, folks:
> 
> I haven't configured a PostgreSQL server since version 11 (before that, I
> did quite a few).
> 
> What's changed in terms of performance configuration since then?  Have the
> fundamentals of shared_buffers/work_mem/max_connections changed at all?
> Which new settings are must-tunes?
> 
> I've heard about new parallel stuff an JIT, but neither is that applicable
> to my use-case.

shared buffers is the same, but btree indexes are frequently (IME) 3x
smaller (!) since deduplication was added in v13, so s_b might not need
to be as large.

In addition to setting work_mem, you can also (since v13) set
hash_mem_multiplier.

default_toast_compression = lz4 # v14
recovery_init_sync_method = syncfs # v14
check_client_connection_interval = ... # v14
wal_compression = {lz4,zstd} # v15

Peeking at my notes, there's also: partitioning, parallel query, brin
indexes, extended statistics, reindex concurrently, ...

... but I don't think anything is radically changed :)

-- 
Justin