"Druckenmueller, Marc" <marc.druckenmuel...@philips.com> writes:
> I am investigating possible throughput with PostgreSQL 14.4 on an ARM i.MX6 
> Quad CPU (NXP sabre board).
> Testing with a simple python script (running on the same CPU), I get ~1000 
> request/s.

That does seem pretty awful for modern hardware, but it's hard to
tease apart the various potential causes.  How beefy is that CPU
really?  Maybe the overhead is all down to client/server network round
trips?  Maybe psycopg is doing something unnecessarily inefficient?

For comparison, on my development workstation I get

[ create the procedure manually in db test ]
$ cat bench.sql
call dummy_call(1,2,3,array[1,2,3]::float8[]);
$ pgbench -f bench.sql -n -T 10 test
pgbench (16beta1)
transaction type: bench.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 10 s
number of transactions actually processed: 353891
number of failed transactions: 0 (0.000%)
latency average = 0.028 ms
initial connection time = 7.686 ms
tps = 35416.189844 (without initial connection time)

and it'd be more if I weren't using an assertions-enabled
debug build.  It would be interesting to see what you get
from exactly that test case on your ARM board.

BTW, one thing I see that's definitely an avoidable inefficiency in
your test is that you're forcing the array parameter to real[]
(i.e. float4) when the procedure takes double precision[]
(i.e. float8).  That forces an extra run-time conversion.  Swapping
between float4 and float8 in my pgbench test doesn't move the needle
a lot, but it's noticeable.

Another thing to think about is that psycopg might be defaulting
to a TCP rather than Unix-socket connection, and that might add
overhead depending on what kernel you're using.  Although, rather
than try to micro-optimize that, you probably ought to be thinking
of how to remove network round trips altogether.  I can get upwards
of 300K calls/second if I push the loop to the server side:

test=# \timing
Timing is on.
test=# do $$
declare x int := 1; a float8[] := array[1,2,3];
begin
for i in 1..1000000 loop
  call dummy_call (x,x,x,a);
end loop;
end $$;
DO
Time: 3256.023 ms (00:03.256)
test=# select 1000000/3.256023;
      ?column?       
---------------------
 307123.137643683721
(1 row)

Again, it would be interesting to compare exactly that
test case on your ARM board.

                        regards, tom lane


Reply via email to