On 11/8/21 20:44, Han Zhou wrote: > > > On Mon, Nov 8, 2021 at 11:33 AM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: >> >> On 11/8/21 20:14, Han Zhou wrote: >> > >> > >> > On Mon, Nov 8, 2021 at 10:49 AM Ilya Maximets <[email protected] >> > <mailto:[email protected]> <mailto:[email protected] >> > <mailto:[email protected]>>> wrote: >> >> >> >> On 11/8/21 19:24, Han Zhou wrote: >> >> > >> >> > >> >> > On Fri, Nov 5, 2021 at 2:55 PM Ilya Maximets <[email protected] >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> > <mailto:[email protected]>> <mailto:[email protected] >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> > <mailto:[email protected]>>>> wrote: >> >> >> >> >> >> On 10/23/21 08:40, Han Zhou wrote: >> >> >> > >> >> >> > >> >> >> > On Thu, Oct 14, 2021 at 7:58 AM Ilya Maximets <[email protected] >> >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> >> > <mailto:[email protected]>> <mailto:[email protected] >> >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> >> > <mailto:[email protected]>>> <mailto:[email protected] >> >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> >> > <mailto:[email protected]>> <mailto:[email protected] >> >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> >> > <mailto:[email protected]>>>>> wrote: >> >> >> >> >> >> >> >> If user frequently changes a lot of rows in a database, transaction >> >> >> >> history could grow way larger than the database itself. This wastes >> >> >> >> a lot of memory and also makes monitor_cond_since slower than >> >> >> >> usual monotor_cond if the transaction id is old enough, because >> >> >> >> re-construction of the changes from a history is slower than just >> >> >> >> creation of initial database snapshot. This is also the case if >> >> >> >> user deleted a lot of data, so transaction history still holds all >> >> >> >> of >> >> >> >> it while the database itself doesn't. >> >> >> >> >> >> >> >> In case of current lb-per-service model in ovn-kubernetes, each >> >> >> >> load-balancer is added to every logical switch/router. Such a >> >> >> >> transaction touches more than a half of a OVN_Northbound database. >> >> >> >> And each of these transactions is added to the transaction history. >> >> >> >> Since transaction history depth is 100, in worst case scenario, >> >> >> >> it will hold 100 copies of a database increasing memory consumption >> >> >> >> dramatically. In tests with 3000 LBs and 120 LSs, memory goes up >> >> >> >> to 3 GB, while holding at 30 MB if transaction history disabled in >> >> >> >> the code. >> >> >> >> >> >> >> >> Fixing that by keeping count of the number of ovsdb_atom's in the >> >> >> >> database and not allowing the total number of atoms in transaction >> >> >> >> history to grow larger than this value. Counting atoms is fairly >> >> >> >> cheap because we don't need to iterate over them, so it doesn't have >> >> >> >> significant performance impact. It would be ideal to measure the >> >> >> >> size of individual atoms, but that will hit the performance. >> >> >> >> Counting cells instead of atoms is not sufficient, because OVN >> >> >> >> users are adding hundreds or thousands of atoms to a single cell, >> >> >> >> so they are largely different in size. >> >> >> >> >> >> >> >> Signed-off-by: Ilya Maximets <[email protected] >> >> >> >> <mailto:[email protected]> <mailto:[email protected] >> >> >> >> <mailto:[email protected]>> <mailto:[email protected] >> >> >> >> <mailto:[email protected]> <mailto:[email protected] >> >> >> >> <mailto:[email protected]>>> <mailto:[email protected] >> >> >> >> <mailto:[email protected]> <mailto:[email protected] >> >> >> >> <mailto:[email protected]>> <mailto:[email protected] >> >> >> >> <mailto:[email protected]> <mailto:[email protected] >> >> >> >> <mailto:[email protected]>>>>> >> >> >> >> --- >> >> >> >> ovsdb/ovsdb.c | 5 ++++ >> >> >> >> ovsdb/ovsdb.h | 3 +++ >> >> >> >> ovsdb/transaction.c | 56 >> >> >> >> +++++++++++++++++++++++++++++++++++++++++++-- >> >> >> >> 3 files changed, 62 insertions(+), 2 deletions(-) >> >> >> >> >> >> <snip> >> >> >> >> >> >> > >> >> >> > Thanks Ilya. This looks good to me. Although the atoms number is not >> >> >> > equal to the size of the data, it should be sufficient for the >> >> >> > purpose of controlling history size. >> >> >> > It would be good to have a test case to ensure this behavior, and >> >> >> > also ensure the atoms count is as expected, just to avoid messing up >> >> >> > the counters by mistake in the future. (if that happens it is not >> >> >> > easily noticed because there is no functional difference but >> >> >> > possible performance impact of fast-resync) >> >> >> > >> >> >> > Acked-by: Han Zhou <[email protected] <mailto:[email protected]> >> >> >> > <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] >> >> >> > <mailto:[email protected]> <mailto:[email protected] >> >> >> > <mailto:[email protected]>>> <mailto:[email protected] >> >> >> > <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> >> >> >> > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] >> >> >> > <mailto:[email protected]>>>>> >> >> >> >> >> >> Thanks for review! I prepared a test below for the number of atoms. >> >> >> If it looks good to you, I can fold it in before applying the patch. >> >> >> >> >> >> Han, Dumitru, What do you think? >> >> >> >> >> >> diff --git a/tests/ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>> >> >> >> b/tests/ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>> >> >> >> index ac243d6a7..2b742f78b 100644 >> >> >> --- a/tests/ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>> >> >> >> +++ b/tests/ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> >> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>> >> >> >> @@ -1228,6 +1228,69 @@ AT_CHECK([test $logged_updates -lt >> >> >> $logged_nonblock_updates]) >> >> >> AT_CHECK_UNQUOTED([ovs-vsctl get open_vswitch . system_version], [0], >> >> >> [xyzzy$counter >> >> >> ]) >> >> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> >> >> +AT_CLEANUP >> >> >> + >> >> >> +AT_SETUP([ovsdb-server transaction history size]) >> >> >> +on_exit 'kill `cat *.pid`' >> >> >> + >> >> >> +dnl Start an ovsdb-server with the clustered vswitchd schema. >> >> >> +AT_CHECK([ovsdb-tool create-cluster db dnl >> >> >> + $abs_top_srcdir/vswitchd/vswitch.ovsschema unix:s1.raft], >> >> >> + [0], [ignore], [ignore]) >> >> >> +AT_CHECK([ovsdb-server --detach --no-chdir --pidfile dnl >> >> >> + --log-file --remote=punix:db.sock db], >> >> >> + [0], [ignore], [ignore]) >> >> >> +AT_CHECK([ovs-vsctl --no-wait init]) >> >> >> + >> >> >> +dnl Create a bridge with N ports per transaction. Increase N every 4 >> >> >> +dnl iterations. And then remove the bridges. By increasing the size >> >> >> of >> >> >> +dnl transactions, ensuring that they take up a significant percentage >> >> >> of >> >> >> +dnl the total database size, so the transaction history will not be >> >> >> able >> >> >> +dnl to hold all of them. >> >> >> +dnl >> >> >> +dnl The test verifies that the number of atoms in the transaction >> >> >> history >> >> >> +dnl is always less than the number of atoms in the database. >> >> >> +get_n_atoms () { >> >> >> + n=$(ovs-appctl -t ovsdb-server memory/show dnl >> >> >> + | tr ' ' '\n' | grep atoms | grep "^$1:" | cut -d ':' -f >> >> >> 2) >> >> >> + if test X"$n" == "X"; then >> >> >> + n=0 >> >> >> + fi >> >> >> + echo $n >> >> >> +} >> >> >> + >> >> >> +check_atoms () { >> >> >> + n_db_atoms=$(get_n_atoms atoms) >> >> >> + n_txn_history_atoms=$(get_n_atoms txn-history-atoms) >> >> >> + echo "n_db_atoms: $n_db_atoms" >> >> >> + echo "n_txn_history_atoms: $n_txn_history_atoms" >> >> >> + AT_CHECK([test $n_txn_history_atoms -le $n_db_atoms]) >> >> >> +} >> >> >> + >> >> >> +add_ports () { >> >> >> + for j in $(seq 1 $2); do >> >> >> + printf " -- add-port br$1 p$1-%d" $j >> >> >> + done >> >> >> +} >> >> >> + >> >> >> +initial_db_atoms=$(get_n_atoms atoms) >> >> >> + >> >> >> +for i in $(seq 1 100); do >> >> >> + cmd=$(add_ports $i $(($i / 4 + 1))) >> >> >> + AT_CHECK([ovs-vsctl --no-wait add-br br$i $cmd]) >> >> >> + check_atoms >> >> >> +done >> >> >> + >> >> >> +for i in $(seq 1 100); do >> >> >> + AT_CHECK([ovs-vsctl --no-wait del-br br$i]) >> >> >> + check_atoms >> >> >> +done >> >> >> + >> >> > >> >> > Thanks Ilya for the test case. I noticed that it seems the case when >> >> > txn_history is bigger than the data is not covered by the above loops. >> >> > When i increases from 1 to 100, the history is smaller or equal to the >> >> > whole data. Then when the first deletion happens, because there are max >> >> > 100 entries in the history, so the oldest history is removed, which is >> >> > the same size as the new transaction, because it is deleting the same >> >> > br that was added in the oldest transaction, and the txn history size >> >> > is still smaller or equal to the whole data items, and so on ... And it >> >> > seems that simply changing the loop count to 99 or less would have the >> >> > scenario covered. >> >> >> >> Hmm. Deletions are happening in the same order, i.e. small bridges >> >> first. And here is how the last addition and the first deletion looks >> >> like when I ran the test on my machine (I added an extra call for >> >> memory/show to print actual history size): >> >> >> >> # make -j8 check TESTSUITEFLAGS='-k history -v' >> >> <...> >> >> ./ovsdb-server.at:1281 <http://ovsdb-server.at:1281> >> >> <http://ovsdb-server.at:1281 <http://ovsdb-server.at:1281>>: ovs-vsctl >> >> --no-wait add-br br$i $cmd >> >> atoms:26572 cells:89364 monitors:0 raft-log:102 txn-history:49 >> >> txn-history-atoms:26479 >> >> n_db_atoms: 26572 >> >> n_txn_history_atoms: 26479 >> >> ./ovsdb-server.at:1268 <http://ovsdb-server.at:1268> >> >> <http://ovsdb-server.at:1268 <http://ovsdb-server.at:1268>>: test >> >> $n_txn_history_atoms -le $n_db_atoms >> >> ./ovsdb-server.at:1287 <http://ovsdb-server.at:1287> >> >> <http://ovsdb-server.at:1287 <http://ovsdb-server.at:1287>>: ovs-vsctl >> >> --no-wait del-br br$i >> >> atoms:26527 cells:89218 monitors:0 raft-log:103 txn-history:49 >> >> txn-history-atoms:26341 >> >> n_db_atoms: 26527 >> >> n_txn_history_atoms: 26341 >> >> <...> >> >> >> >> So, the history doesn't grow larger than 49 entries in that case. >> >> Later, due to deletion of small bridges, history grows up to 55 entries >> >> and gradually goes down after that with removal of larger bridges. >> >> >> >> Is it different in you setup? Or did I misunderstand? >> > >> > Ok, I ran it myself and it turned out I was wrong. I was expecting that >> > the atoms in the transaction history that adds a br and its ports should >> > be the same as the total atoms increased in the DB by this transaction, >> > but it turned out that the transaction history increases more than what's >> > increased in the DB. So my assumption "When i increases from 1 to 100, the >> > history is smaller or equal to the whole data" was wrong. Did you happen >> > to know why the txn has more than what's added to the DB? >> >> Yep. Transaction basically holds 2x of some of the >> database data, because it holds both old and new versions >> of rows. Addition of a bridge here is not purely addition, >> because ovs-vsctl will also add this new bridge to the >> 'bridges' column of the row in 'Open_vSwitch' table. >> And it will also change the value of the 'next_cfg' column >> in that row. These are mutations that require having >> both old and new versions of a row in the transaction. >> >> So, each transaction will hold old and new versions of the >> single row from 'Open_vSwitch' table along with all the new >> data (bridge itself and ports). Hence it will be a bit >> larger than the added new data. > > OK. Sorry that I forgot that the bridges column will be updated. > Thanks for the explanation!
No problem. I had to look this up myself, as it's not obvious. Thanks for review! Applied. Best regards, Ilya Maximets. > > Han > >> >> > >> >> >> >> > >> >> >> +dnl After removing all the bridges, the number of atoms in the >> >> >> database >> >> >> +dnl should return to its initial value. >> >> >> +AT_CHECK([test $(get_n_atoms atoms) -eq $initial_db_atoms]) >> >> >> + >> >> >> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> >> >> AT_CLEANUP >> >> >> >> >> >> --- >> >> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
