On Fri, 5 Apr 2024 at 03:28, Melih Mutlu <m.melihmu...@gmail.com> wrote:
>
> Jelte Fennema-Nio <postg...@jeltef.nl>, 4 Nis 2024 Per, 16:34 tarihinde şunu 
> yazdı:
>>
>> On Thu, 4 Apr 2024 at 13:08, Melih Mutlu <m.melihmu...@gmail.com> wrote:
>> > I changed internal_flush() to an inline function, results look better this 
>> > way.
>>
>> It seems you also change internal_flush_buffer to be inline (but only
>> in the actual function definition, not declaration at the top). I
>> don't think inlining internal_flush_buffer should be necessary to
>> avoid the perf regressions, i.e. internal_flush is adding extra
>> indirection compared to master and is only a single line, so that one
>> makes sense to inline.
>
> Right. It was a mistake, forgot to remove that. Fixed it in v5.

I don't see any issues with v5, so based on the performance numbers
shown on this thread for the latest patch, it would make sense to push
it.  The problem is, I just can't recreate the performance numbers.

I've tried both on my AMD 3990x machine and an Apple M2 with a script
similar to the test.sh from above.  I mostly just stripped out the
buffer size stuff and adjusted the timing code to something that would
work with mac.

The script runs each copy 30 times and takes the average time,
reported here in seconds.

With AMD 3990x:

master
Run 100 100 5000000: 1.032264113 sec
Run 1024 10240 200000: 1.016229105 sec
Run 1024 1048576 2000: 1.242267116 sec
Run 1048576 1048576 1000: 1.245425089 sec

v5
Run 100 100 5000000: 1.068543053 sec
Run 1024 10240 200000: 1.026298571 sec
Run 1024 1048576 2000: 1.231169669 sec
Run 1048576 1048576 1000: 1.236355567 sec

With the M2 mini:

master
Run 100 100 5000000: 1.167851249 sec
Run 1024 10240 200000: 1.962466987 sec
Run 1024 1048576 2000: 2.052836275 sec
Run 1048576 1048576 1000: 2.057908066 sec

v5
Run 100 100 5000000: 1.149636571 sec
Run 1024 10240 200000: 2.158487741 sec
Run 1024 1048576 2000: 2.046627068 sec
Run 1048576 1048576 1000: 2.039329068 sec

>From looking at the perf reports, the top function is:

  57.62%  postgres   [.] CopyAttributeOutText

I messed around with trying to speed up the string escaping in that
function with the attached hacky patch and got the following on the
AMD 3990x machine:

CopyAttributeOutText_speedup.patch.txt
Run 100 100 5000000: 0.821673910
Run 1024 10240 200000: 0.546632147
Run 1024 1048576 2000: 0.848492694
Run 1048576 1048576 1000: 0.840870293

I don't think we could actually do this unless we modified the output
function API to have it somehow output the number of bytes. The patch
may look beyond the NUL byte with pg_lfind8, which I don't think is
safe.

Does anyone else want to try the attached script on the v5 patch to
see if their numbers are better?

David
#!/bin/bash

dbname=postgres
port=5432

test_cases=(
"100 100 5000000"               # only 100 bytes        
"1024 10240 200000"    # 1Kb and 10Kb 
"1024 1048576 2000"             # 1Kb and 1Mb 
"1048576 1048576 1000"  # all 1Mb
)

insert_rows(){
        psql -d $dbname -p $port -c "
        DO \$\$
        DECLARE
            counter INT;
        BEGIN
            FOR counter IN 1..$3 LOOP
                IF counter % 2 = 1 THEN
                    INSERT INTO test_table VALUES (repeat('a', $1)::text);
                ELSE
                    INSERT INTO test_table VALUES (repeat('b', $2)::text);
                END IF;
            END LOOP;
        END \$\$;
        " > /dev/null
}


psql -d $dbname -p $port -c "CREATE EXTENSION IF NOT EXISTS pg_prewarm;" > 
/dev/null

for case in "${test_cases[@]}" 
do
        psql -d $dbname -p $port -c "DROP TABLE IF EXISTS test_table;" > 
/dev/null
        psql -d $dbname -p $port -c "CREATE TABLE test_table(data text not 
null);" > /dev/null
        insert_rows $case

        psql -d $dbname -p $port -c "select pg_prewarm('test_table');" > 
/dev/null

        echo -n "Run $case: "

        elapsed_time=0
        for a in {1..30}
        do
                start_time=$(perl -MTime::HiRes=time -e 'printf "%.9f\n", time')
                psql -d $dbname -p $port -c "COPY test_table TO STDOUT;" > 
/dev/null
                end_time=$(perl -MTime::HiRes=time -e 'printf "%.9f\n", time')
                elapsed_time=$(perl -e "printf('%.9f', ($end_time - 
$start_time) + $elapsed_time)")
        done
        
        avg_elapsed_time_in_ms=$(perl -e "printf('%.9f', ($elapsed_time / 30))")
        echo $avg_elapsed_time_in_ms
done

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d7..fcb200503f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -29,6 +29,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "port/pg_lfind.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/lsyscache.h"
@@ -1068,9 +1069,18 @@ CopyAttributeOutText(CopyToState cstate, const char 
*string)
        else
        {
                start = ptr;
-               while ((c = *ptr) != '\0')
+
+               while (true)
                {
-                       if ((unsigned char) c < (unsigned char) 0x20)
+                       while (!pg_lfind8('\\', (uint8 *) ptr, sizeof(Vector8)) 
&&
+                                  !pg_lfind8(delimc, (uint8 *) ptr, 
sizeof(Vector8)) &&
+                                  !pg_lfind8_le(31, (uint8 *) ptr, 
sizeof(Vector8)))
+                               ptr += sizeof(Vector8);
+
+                       if ((c = *ptr) == '\0')
+                               break;
+
+                       if ((unsigned char) c <= (unsigned char) 31)
                        {
                                /*
                                 * \r and \n must be escaped, the others are 
traditional. We

Reply via email to