Hi,

I'm researchingon pglogical,performing tests to see the impact on the network 
traffic, in comparisonwith streaming replication.

I configured one provider and one subscriber both running on a VM, both on 
Postgres 9.6.3 and latest pglogical 2.2.0. 

Forcomparison, Ialso have one master and one hot standby, running on the same 
VM and on the same Postgres.

I performed 2 different kind of tests. One using pgbench and another manually 
generating data. But there is something I do not understand when it comes to 
draw conclusions.Therefore I would appreciate yourhelp.


=Test 1=

I create a schema/table called:

my_replicated_schema.my_first_replicated_table

I ask to provider to add it to the replication set using:

SELECT pglogical.replication_set_add_all_tables('default', 
ARRAY['my_replicated_schema']);

Then I generate 50 million +1 records on the table:

insert into my_replicated_schema.my_first_replicated_table values 
(generate_series(0,50000000));'


At this stage I can check how much traffic was generated to pass the records to 
the other node.

Logical replication costs:

         RX bytes:10271325 (9.7 MiB)  TX bytes:2465344586 (2.2 GiB)



While streaming replication:

         RX bytes:23058328 (21.9 MiB)  TX bytes:7502525286 (6.9 GiB)




Conclusion:

Pglogical is more convenient.

Nice.



= Test 2=

Same configuration used to test pglogical during test 1.Accidentally, Idid not 
change the replication set. So the replication set was still using: 
'my_replicated_schema.my_first_replicated_table'

Pgbench instead writes to 'public'schema.


I theninitialize pgbench tables on the provider, using:

pgbench -i 

which results in:

 \dt public.
             List of relations
 Schema |       Name       | Type  | Owner
--------+------------------+-------+-------
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(4 rows)


I then run pgbench on the provider:


pgbench -c 3 -P 10  -r -T 200 -R 1000


And with big surprise, Ican thensee that an amount of traffic was generated:

          RX bytes:35500329 (33.8 MiB)  TX bytes:66376459 (63.3 MiB)

This is very strange to me. Running similar tests, where on the provider I 
manually push data on a table which is not in the replication set, no traffic 
was generated.

There must be an explanation for what is happening, and I think is more related 
to pgbench than pglogical, but i cannot find it. Do you have pointers?



= Test 3 =

Provider is replicating the schema public, and subscriber receiving it.

I then run pgbenchsame way as above:

pgbench -c 3 -P 10  -r -T 200 -R 1000


But I get results which are contradicting test 1.

Pglogical replication uses:


         RX bytes:69783187 (66.5 MiB)  TX bytes:371664060 (354.4 MiB)


While streaming replication:


          RX bytes:86286353 (82.2 MiB)  TX bytes:281693815 (268.6 MiB)


Here, one would say that streaming replication is cheaper..


Also I cannot explain why is that. Do you?



Side notes:

- All tests ran over multiple times, producing identical results

- I also ran a test similar to 'Test1' but updating results (instead of running 
'INSERT'), which as expected gave same outcome as 'Test 1'

- At every run Idestroy 'pgbench' database and recreate it, in order to start 
as clean as possible.

- As cross check, Im also checking that traffic wise,the numbers that appear on 
the provider are specular on the subscriber

- Here Ireport the exact commands I ran in order to reproduce the test beds for 
pglogical:


Test 1 and 2:

Provider called 'dbfabio':

PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
psql $pglog_db  -c "CREATE extension pglogical;"
psql $pglog_db  -c "CREATE schema my_replicated_schema"
psql $pglog_db  -c "CREATE table my_replicated_schema.my_first_replicated_table 
(i int primary key)"
psql $pglog_db  -c "SELECT pglogical.create_node(node_name := 
'provider.$pglog_db', dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"
psql $pglog_db  -c "SELECT pglogical.replication_set_add_all_tables('default', 
ARRAY['my_replicated_schema']);"


Subscriber called 'dbfabio2':

PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
psql $pglog_db -c "CREATE extension pglogical;"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 
'subscriber.$pglog_db', dsn := 'host=dbfabio2 port=5432 dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.create_subscription(subscription_name := 
'subscription_to_dbfabio_$pglog_db',  synchronize_structure := true, 
provider_dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"



Test 3:

Provider:

PGDATA=pgbench
createdb pgbench
pglog_db=pgbench
pgbench -i 

# Now a small hack, since pglogical only accepts tables who have a primary key. 
pgbench_historical does not have it, out of the box: (maybe here there is some 
room for an improvement to propose for pgbench code? what do you think?)

psql $pglog_db -c "ALTER TABLE pgbench_history ADD COLUMN id SERIAL PRIMARY 
KEY;"
psql $pglog_db -c "CREATEextension pglogical;"
psql $pglog_db -c "SELECT pglogical.create_node(node_name := 
'provider.$pglog_db', dsn := 'host=dbfabio port=5432 dbname=$pglog_db');"
psql $pglog_db -c "SELECT pglogical.replication_set_add_all_tables('default', 
ARRAY['public']);"


Subscriber:

Same as test 1



Streaming replication setup looks like:

Master:

wal_level = hot_standby

Standby:

hot_standby = on

+ recovery.conf



any help is appreciated.


regards,

fabio pardi


Reply via email to