Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread Quan Zongliang




On 2023/9/6 17:07, John Naylor wrote:


On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang > wrote:


 > If not optimized(--enable-debug CFLAGS='-O0'), there is a clear
 > difference. When the compiler does the optimization, the performance is
 > similar. I think the compiler does a good enough optimization with
 > "pg_attribute_always_inline" and the last two constant parameters when
 > calling heapgetpage_collect.

So as we might expect, more specialization (Andres' patch) has no 
apparent downsides in this workload. (While I'm not sure of the point of 
testing at -O0, I think we can conclude that less-bright compilers will 
show some improvement with either patch.)


If you agree, do you want to withdraw your patch from the commit fest?


Ok.

--
John Naylor
EDB: http://www.enterprisedb.com 






Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread John Naylor
On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang 
wrote:

> If not optimized(--enable-debug CFLAGS='-O0'), there is a clear
> difference. When the compiler does the optimization, the performance is
> similar. I think the compiler does a good enough optimization with
> "pg_attribute_always_inline" and the last two constant parameters when
> calling heapgetpage_collect.

So as we might expect, more specialization (Andres' patch) has no apparent
downsides in this workload. (While I'm not sure of the point of testing at
-O0, I think we can conclude that less-bright compilers will show some
improvement with either patch.)

If you agree, do you want to withdraw your patch from the commit fest?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread Quan Zongliang




On 2023/9/6 15:50, Quan Zongliang wrote:



On 2023/9/5 18:46, John Naylor wrote:


On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang > wrote:


 > Here's how I test it
 >     EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not 
good for these kinds of tests.


 > I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in 
this function, so I suggest testing against that, if you wish to test 
again.



Test with the latest code of the master branch, see the attached results.

If not optimized(--enable-debug CFLAGS='-O0'), there is a clear 
difference. When the compiler does the optimization, the performance is 
similar. I think the compiler does a good enough optimization with 
"pg_attribute_always_inline" and the last two constant parameters when 
calling heapgetpage_collect.


Add a note. The first execution time of an attachment is not calculated 
in the average.





--
John Naylor
EDB: http://www.enterprisedb.com 






Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread Quan Zongliang



On 2023/9/5 18:46, John Naylor wrote:


On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang > wrote:


 > Here's how I test it
 >     EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good 
for these kinds of tests.


 > I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in 
this function, so I suggest testing against that, if you wish to test again.



Test with the latest code of the master branch, see the attached results.

If not optimized(--enable-debug CFLAGS='-O0'), there is a clear 
difference. When the compiler does the optimization, the performance is 
similar. I think the compiler does a good enough optimization with 
"pg_attribute_always_inline" and the last two constant parameters when 
calling heapgetpage_collect.




--
John Naylor
EDB: http://www.enterprisedb.com 

Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-05 Thread John Naylor
On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang 
wrote:

> Here's how I test it
> EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good for
these kinds of tests.

> I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in this
function, so I suggest testing against that, if you wish to test again.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-05 Thread Quan Zongliang




On 2023/9/5 16:15, John Naylor wrote:


On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang > wrote:


 > In the function heapgetpage. If a table is not updated very frequently.
 > Many actions in tuple loops are superfluous. For all_visible pages,
 > loctup does not need to be assigned, nor does the "valid" variable.
 > CheckForSerializableConflictOutNeeded from
 > HeapCheckForSerializableConflictOut function, it only need to inspect at

Thanks for submitting! A few weeks before this, there was another 
proposal, which specializes code for all paths, not just one. That patch 
also does so without duplicating the loop:


https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysj...@awork3.anarazel.de
 



Nice patch. I'm sorry I didn't notice it before.


 > the beginning of the cycle only once. Using vtune you can clearly see
 > the result (attached heapgetpage.jpg).
 >
 > So by splitting the loop logic into two parts, the vtune results show
 > significant improvement (attached heapgetpage-allvis.jpg).

For future reference, it's not clear at all from the screenshots what 
the improvement will be for the user. In the above thread, the author 
shares testing methodology as well as timing measurements. This is 
useful for reproducibilty, as well as convincing others that the change 
is important.



Here's how I test it
   EXPLAIN ANALYZE SELECT * FROM orders;
Maybe the test wasn't good enough. Although the modified optimal result 
looks good. Because it fluctuates a lot. It's hard to compare. The 
results of vtune are therefore used.


My patch is mainly to eliminate:
1, Assignment of "loctup" struct variable (in vtune you can see that 
these 4 lines have a significant overhead: 0.4 1.0 0.2 0.4).

2. Assignment of the "valid" variable.(overhead 0.6)
3. HeapCheckForSerializableConflictOut function call.(overhead 0.6)

Although these are not the same overhead from test to test. But all are 
too obvious to ignore. The screenshots are mainly to show the three 
improvements mentioned above.


I'll also try Andres Freund's test method next.


--
John Naylor
EDB: http://www.enterprisedb.com 






Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-05 Thread John Naylor
On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang 
wrote:

> In the function heapgetpage. If a table is not updated very frequently.
> Many actions in tuple loops are superfluous. For all_visible pages,
> loctup does not need to be assigned, nor does the "valid" variable.
> CheckForSerializableConflictOutNeeded from
> HeapCheckForSerializableConflictOut function, it only need to inspect at

Thanks for submitting! A few weeks before this, there was another proposal,
which specializes code for all paths, not just one. That patch also does so
without duplicating the loop:

https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysj...@awork3.anarazel.de

> the beginning of the cycle only once. Using vtune you can clearly see
> the result (attached heapgetpage.jpg).
>
> So by splitting the loop logic into two parts, the vtune results show
> significant improvement (attached heapgetpage-allvis.jpg).

For future reference, it's not clear at all from the screenshots what the
improvement will be for the user. In the above thread, the author shares
testing methodology as well as timing measurements. This is useful for
reproducibilty, as well as convincing others that the change is important.

--
John Naylor
EDB: http://www.enterprisedb.com