On 11/8/21 19:24, Han Zhou wrote: > > > On Fri, Nov 5, 2021 at 2:55 PM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: >> >> On 10/23/21 08:40, Han Zhou wrote: >> > >> > >> > On Thu, Oct 14, 2021 at 7:58 AM Ilya Maximets <[email protected] >> > <mailto:[email protected]> <mailto:[email protected] >> > <mailto:[email protected]>>> wrote: >> >> >> >> If user frequently changes a lot of rows in a database, transaction >> >> history could grow way larger than the database itself. This wastes >> >> a lot of memory and also makes monitor_cond_since slower than >> >> usual monotor_cond if the transaction id is old enough, because >> >> re-construction of the changes from a history is slower than just >> >> creation of initial database snapshot. This is also the case if >> >> user deleted a lot of data, so transaction history still holds all of >> >> it while the database itself doesn't. >> >> >> >> In case of current lb-per-service model in ovn-kubernetes, each >> >> load-balancer is added to every logical switch/router. Such a >> >> transaction touches more than a half of a OVN_Northbound database. >> >> And each of these transactions is added to the transaction history. >> >> Since transaction history depth is 100, in worst case scenario, >> >> it will hold 100 copies of a database increasing memory consumption >> >> dramatically. In tests with 3000 LBs and 120 LSs, memory goes up >> >> to 3 GB, while holding at 30 MB if transaction history disabled in >> >> the code. >> >> >> >> Fixing that by keeping count of the number of ovsdb_atom's in the >> >> database and not allowing the total number of atoms in transaction >> >> history to grow larger than this value. Counting atoms is fairly >> >> cheap because we don't need to iterate over them, so it doesn't have >> >> significant performance impact. It would be ideal to measure the >> >> size of individual atoms, but that will hit the performance. >> >> Counting cells instead of atoms is not sufficient, because OVN >> >> users are adding hundreds or thousands of atoms to a single cell, >> >> so they are largely different in size. >> >> >> >> Signed-off-by: Ilya Maximets <[email protected] >> >> <mailto:[email protected]> <mailto:[email protected] >> >> <mailto:[email protected]>>> >> >> --- >> >> ovsdb/ovsdb.c | 5 ++++ >> >> ovsdb/ovsdb.h | 3 +++ >> >> ovsdb/transaction.c | 56 +++++++++++++++++++++++++++++++++++++++++++-- >> >> 3 files changed, 62 insertions(+), 2 deletions(-) >> >> <snip> >> >> > >> > Thanks Ilya. This looks good to me. Although the atoms number is not equal >> > to the size of the data, it should be sufficient for the purpose of >> > controlling history size. >> > It would be good to have a test case to ensure this behavior, and also >> > ensure the atoms count is as expected, just to avoid messing up the >> > counters by mistake in the future. (if that happens it is not easily >> > noticed because there is no functional difference but possible performance >> > impact of fast-resync) >> > >> > Acked-by: Han Zhou <[email protected] <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>>> >> >> Thanks for review! I prepared a test below for the number of atoms. >> If it looks good to you, I can fold it in before applying the patch. >> >> Han, Dumitru, What do you think? >> >> diff --git a/tests/ovsdb-server.at <http://ovsdb-server.at> >> b/tests/ovsdb-server.at <http://ovsdb-server.at> >> index ac243d6a7..2b742f78b 100644 >> --- a/tests/ovsdb-server.at <http://ovsdb-server.at> >> +++ b/tests/ovsdb-server.at <http://ovsdb-server.at> >> @@ -1228,6 +1228,69 @@ AT_CHECK([test $logged_updates -lt >> $logged_nonblock_updates]) >> AT_CHECK_UNQUOTED([ovs-vsctl get open_vswitch . system_version], [0], >> [xyzzy$counter >> ]) >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> +AT_CLEANUP >> + >> +AT_SETUP([ovsdb-server transaction history size]) >> +on_exit 'kill `cat *.pid`' >> + >> +dnl Start an ovsdb-server with the clustered vswitchd schema. >> +AT_CHECK([ovsdb-tool create-cluster db dnl >> + $abs_top_srcdir/vswitchd/vswitch.ovsschema unix:s1.raft], >> + [0], [ignore], [ignore]) >> +AT_CHECK([ovsdb-server --detach --no-chdir --pidfile dnl >> + --log-file --remote=punix:db.sock db], >> + [0], [ignore], [ignore]) >> +AT_CHECK([ovs-vsctl --no-wait init]) >> + >> +dnl Create a bridge with N ports per transaction. Increase N every 4 >> +dnl iterations. And then remove the bridges. By increasing the size of >> +dnl transactions, ensuring that they take up a significant percentage of >> +dnl the total database size, so the transaction history will not be able >> +dnl to hold all of them. >> +dnl >> +dnl The test verifies that the number of atoms in the transaction history >> +dnl is always less than the number of atoms in the database. >> +get_n_atoms () { >> + n=$(ovs-appctl -t ovsdb-server memory/show dnl >> + | tr ' ' '\n' | grep atoms | grep "^$1:" | cut -d ':' -f 2) >> + if test X"$n" == "X"; then >> + n=0 >> + fi >> + echo $n >> +} >> + >> +check_atoms () { >> + n_db_atoms=$(get_n_atoms atoms) >> + n_txn_history_atoms=$(get_n_atoms txn-history-atoms) >> + echo "n_db_atoms: $n_db_atoms" >> + echo "n_txn_history_atoms: $n_txn_history_atoms" >> + AT_CHECK([test $n_txn_history_atoms -le $n_db_atoms]) >> +} >> + >> +add_ports () { >> + for j in $(seq 1 $2); do >> + printf " -- add-port br$1 p$1-%d" $j >> + done >> +} >> + >> +initial_db_atoms=$(get_n_atoms atoms) >> + >> +for i in $(seq 1 100); do >> + cmd=$(add_ports $i $(($i / 4 + 1))) >> + AT_CHECK([ovs-vsctl --no-wait add-br br$i $cmd]) >> + check_atoms >> +done >> + >> +for i in $(seq 1 100); do >> + AT_CHECK([ovs-vsctl --no-wait del-br br$i]) >> + check_atoms >> +done >> + > > Thanks Ilya for the test case. I noticed that it seems the case when > txn_history is bigger than the data is not covered by the above loops. When i > increases from 1 to 100, the history is smaller or equal to the whole data. > Then when the first deletion happens, because there are max 100 entries in > the history, so the oldest history is removed, which is the same size as the > new transaction, because it is deleting the same br that was added in the > oldest transaction, and the txn history size is still smaller or equal to the > whole data items, and so on ... And it seems that simply changing the loop > count to 99 or less would have the scenario covered.
Hmm. Deletions are happening in the same order, i.e. small bridges first. And here is how the last addition and the first deletion looks like when I ran the test on my machine (I added an extra call for memory/show to print actual history size): # make -j8 check TESTSUITEFLAGS='-k history -v' <...> ./ovsdb-server.at:1281: ovs-vsctl --no-wait add-br br$i $cmd atoms:26572 cells:89364 monitors:0 raft-log:102 txn-history:49 txn-history-atoms:26479 n_db_atoms: 26572 n_txn_history_atoms: 26479 ./ovsdb-server.at:1268: test $n_txn_history_atoms -le $n_db_atoms ./ovsdb-server.at:1287: ovs-vsctl --no-wait del-br br$i atoms:26527 cells:89218 monitors:0 raft-log:103 txn-history:49 txn-history-atoms:26341 n_db_atoms: 26527 n_txn_history_atoms: 26341 <...> So, the history doesn't grow larger than 49 entries in that case. Later, due to deletion of small bridges, history grows up to 55 entries and gradually goes down after that with removal of larger bridges. Is it different in you setup? Or did I misunderstand? > >> +dnl After removing all the bridges, the number of atoms in the database >> +dnl should return to its initial value. >> +AT_CHECK([test $(get_n_atoms atoms) -eq $initial_db_atoms]) >> + >> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> AT_CLEANUP >> >> --- _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
