Hi all,
I have a general question about network traffic between PostgreSQL’s client and server: what determines the network bandwidth usage or data transferring rate between a client and a server when network bandwidth is enough? For example, I ran queries on two tables, lineitem and partsupp in TPCH benchmark (with scaling factor 5). Lineitem table is 4630 MB and has 30000000 rows. Partsupp table is 693 MB and has 4000000 rows. Their definitions are shown below: Table "public.lineitem" Column | Type | Modifiers -----------------+-----------------------+----------- l_orderkey | integer | not null l_partkey | integer | not null l_suppkey | integer | not null l_linenumber | integer | not null l_quantity | numeric(15,2) | not null l_extendedprice | numeric(15,2) | not null l_discount | numeric(15,2) | not null l_tax | numeric(15,2) | not null l_returnflag | character(1) | not null l_linestatus | character(1) | not null l_shipdate | date | not null l_commitdate | date | not null l_receiptdate | date | not null l_shipinstruct | character(25) | not null l_shipmode | character(10) | not null l_comment | character varying(44) | not null Table "public.partsupp" Column | Type | Modifiers ---------------+------------------------+----------- ps_partkey | integer | not null ps_suppkey | integer | not null ps_availqty | integer | not null ps_supplycost | numeric(15,2) | not null ps_comment | character varying(199) | not null For different queries, I observe different data transferring rate between a client and a server (client and server are on different physical machines) using tcpdump as shown below: Query 1: select * from lineitem; Seq Scan on lineitem (cost=0.00..892562.86 rows=29998686 width=125) Average network usage: *42.5MB/s* Query 2: select * from partsupp; Seq Scan on partsupp (cost=0.00..128685.81 rows=4001181 width=146) Average network usage: *95.9MB/s* Query 3: select * from lineitem, partsupp where l_partkey=ps_partkey; Hash Join (cost=178700.57..17307550.15 rows=116194700 width=271) Hash Cond: (lineitem.l_partkey = partsupp.ps_partkey) -> Seq Scan on lineitem (cost=0.00..892562.86 rows=29998686 width=125) -> Hash (cost=128685.81..128685.81 rows=4001181 width=146) -> Seq Scan on partsupp (cost=0.00..128685.81 rows=4001181 width=146) Average network usage: *53.1MB/s* In all the experiments, the lineitem and partsupp tables reside in memory because there is no io activities observed from iotop. Since there is enough network bandwidth (1Gb/s or 128MB/s) between client and server, I would like to know what determines the data transferring rate or the network bandwidth usage between a client and a server when network bandwidth is enough. For example, given that the size of each tuple of lineitem table is 88% of that of partsupp, why the average network usage for sequential scan of lineitem table is only 50% that of partsupp table? And why the average network usage of their join is higher than that of sequential scan of lineitem but lower than that of sequential scan of partsupp table? Thanks! Kelphet Xiong