On 7/30/24 09:13, Sutou Kouhei wrote:
> Hi,
> 
> In <26541788-8853-4d93-86cd-5f701b13a...@enterprisedb.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Mon, 29 Jul 2024 14:17:08 +0200,
>   Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
> 
>> I wrote a simple script to automate the benchmark - it just runs these
>> tests with different parameters (number of columns and number of
>> imported/exported rows). See the run.sh attachment, along with two CSV
>> results from current master and with all patches applied.
> 
> Thanks. I also used the script with some modifications:
> 
> 1. Create a test database automatically
> 2. Enable blackhole_am automatically
> 3. Create create_table_cols() automatically
> 
> I attach it. I also attach results of master and patched. My
> results are from my desktop. So it's probably noisy.
> 
>> - For COPY FROM there is no difference - the results are within 1% of
>> master, and there's no systemic difference.
>>
>> - For COPY TO it's a different story, though. There's a pretty clear
>> regression, by ~5%. It's a bit interesting the correlation with the
>> number of columns is not stronger ...
> 
> My results showed different trend:
> 
> - COPY FROM: Patched is about 15-20% slower than master
> - COPY TO: Patched is a bit faster than master
> 
> Here are some my numbers:
> 
> type  n_cols  n_rows  diff    master          patched
> ----------------------------------------------------------
> TO    5       1       100.56% 218.376000      219.609000
> FROM  5       1       113.33% 168.493000      190.954000
> ...
> TO    5       5       100.60% 1037.773000     1044.045000
> FROM  5       5       116.46% 767.966000      894.377000
> ...
> TO    5       10      100.15% 2092.245000     2095.472000
> FROM  5       10      115.91% 1508.160000     1748.130000
> TO    10      1       98.62%  353.087000      348.214000
> FROM  10      1       118.65% 260.551000      309.133000
> ...
> TO    10      5       96.89%  1724.061000     1670.427000
> FROM  10      5       119.92% 1224.098000     1467.941000
> ...
> TO    10      10      98.70%  3444.291000     3399.538000
> FROM  10      10      118.79% 2462.314000     2924.866000
> TO    15      1       97.71%  492.082000      480.802000
> FROM  15      1       115.59% 347.820000      402.033000
> ...
> TO    15      5       98.32%  2402.419000     2362.140000
> FROM  15      5       115.48% 1657.594000     1914.245000
> ...
> TO    15      10      96.91%  4830.319000     4681.145000
> FROM  15      10      115.09% 3304.798000     3803.542000
> TO    20      1       96.05%  629.828000      604.939000
> FROM  20      1       118.50% 438.673000      519.839000
> ...
> TO    20      5       97.15%  3084.210000     2996.331000
> FROM  20      5       115.35% 2110.909000     2435.032000
> ...
> TO    25      1       98.29%  764.779000      751.684000
> FROM  25      1       115.13% 519.686000      598.301000
> ...
> TO    25      5       94.08%  3843.996000     3616.614000
> FROM  25      5       115.62% 2554.008000     2952.928000
> ...
> TO    25      10      97.41%  7504.865000     7310.549000
> FROM  25      10      117.25% 4994.463000     5856.029000
> TO    30      1       94.39%  906.324000      855.503000
> FROM  30      1       119.60% 604.110000      722.491000
> ...
> TO    30      5       96.50%  4419.907000     4265.417000
> FROM  30      5       116.97% 2932.883000     3430.556000
> ...
> TO    30      10      94.39%  8974.878000     8470.991000
> FROM  30      10      117.84% 5800.793000     6835.900000
> ----
> 
> See the attached diff.txt for full numbers.
> I also attach scripts to generate the diff.txt. Here is the
> command line I used:
> 
> ----
> ruby diff.rb <(ruby aggregate.rb master.result) <(ruby aggregate.rb 
> patched.result) | tee diff.txt
> ----
> 
> My environment:
> 
> * Debian GNU/Linux sid
> * gcc (Debian 13.3.0-2) 13.3.0
> * AMD Ryzen 9 3900X 12-Core Processor
> 
> I'll look into this.
> 
> If someone is interested in this proposal, could you share
> your numbers?
> 

I'm on Fedora 40 with gcc 14.1, on Intel i7-9750H. But it's running on
Qubes OS, so it's really in a VM which makes it noisier. I'll try to do
more benchmarks on a regular hw, but that may take a couple days.

I decided to do the benchmark for individual parts of the patch series.
The attached PDF shows results for master (label 0000) and the 0001-0005
patches, along with relative performance difference between the patches.
The color scale is the same as before - red = bad, green = good.

There are pretty clear differences between the patches and "direction"
of the COPY. I'm sure it does depend on the hardware - I tried running
this on rpi5 (with 32-bits), and it looks very different. There might be
a similar behavior difference between Intel and Ryzen, but my point is
that when looking for regressions, looking at these "per patch" charts
can be very useful (as it reduces the scope of changes that might have
caused the regression).

>> It's interesting the main change in the flamegraphs is CopyToStateFlush
>> pops up on the left side. Because, what is that about? That is a thing
>> introduced in the 0005 patch, so maybe the regression is not strictly
>> about the existing formats moving to the new API, but due to something
>> else in a later version of the patch?
> 
> Ah, making static CopySendEndOfRow() a to non-static function
> (CopyToStateFlush()) may be the reason of this. Could you
> try the attached v19 patch? It changes the 0005 patch:
> 

Perhaps, that's possible.

> * It reverts the static change
> * It adds a new non-static function that just exports
>   CopySendEndOfRow()
> 

I'll try to benchmark this later, when the other machine is available.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment: copy-benchmark-per-patch.pdf
Description: Adobe PDF document

Reply via email to