On 11/8/21 20:44, Han Zhou wrote:
> 
> 
> On Mon, Nov 8, 2021 at 11:33 AM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
>>
>> On 11/8/21 20:14, Han Zhou wrote:
>> >
>> >
>> > On Mon, Nov 8, 2021 at 10:49 AM Ilya Maximets <[email protected] 
>> > <mailto:[email protected]> <mailto:[email protected] 
>> > <mailto:[email protected]>>> wrote:
>> >>
>> >> On 11/8/21 19:24, Han Zhou wrote:
>> >> >
>> >> >
>> >> > On Fri, Nov 5, 2021 at 2:55 PM Ilya Maximets <[email protected] 
>> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> > <mailto:[email protected]>> <mailto:[email protected] 
>> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> > <mailto:[email protected]>>>> wrote:
>> >> >>
>> >> >> On 10/23/21 08:40, Han Zhou wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Oct 14, 2021 at 7:58 AM Ilya Maximets <[email protected] 
>> >> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]>> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]>>> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]>> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]>>>>> wrote:
>> >> >> >>
>> >> >> >> If user frequently changes a lot of rows in a database, transaction
>> >> >> >> history could grow way larger than the database itself.  This wastes
>> >> >> >> a lot of memory and also makes monitor_cond_since slower than
>> >> >> >> usual monotor_cond if the transaction id is old enough, because
>> >> >> >> re-construction of the changes from a history is slower than just
>> >> >> >> creation of initial database snapshot.  This is also the case if
>> >> >> >> user deleted a lot of data, so transaction history still holds all 
>> >> >> >> of
>> >> >> >> it while the database itself doesn't.
>> >> >> >>
>> >> >> >> In case of current lb-per-service model in ovn-kubernetes, each
>> >> >> >> load-balancer is added to every logical switch/router.  Such a
>> >> >> >> transaction touches more than a half of a OVN_Northbound database.
>> >> >> >> And each of these transactions is added to the transaction history.
>> >> >> >> Since transaction history depth is 100, in worst case scenario,
>> >> >> >> it will hold 100 copies of a database increasing memory consumption
>> >> >> >> dramatically.  In tests with 3000 LBs and 120 LSs, memory goes up
>> >> >> >> to 3 GB, while holding at 30 MB if transaction history disabled in
>> >> >> >> the code.
>> >> >> >>
>> >> >> >> Fixing that by keeping count of the number of ovsdb_atom's in the
>> >> >> >> database and not allowing the total number of atoms in transaction
>> >> >> >> history to grow larger than this value.  Counting atoms is fairly
>> >> >> >> cheap because we don't need to iterate over them, so it doesn't have
>> >> >> >> significant performance impact.  It would be ideal to measure the
>> >> >> >> size of individual atoms, but that will hit the performance.
>> >> >> >> Counting cells instead of atoms is not sufficient, because OVN
>> >> >> >> users are adding hundreds or thousands of atoms to a single cell,
>> >> >> >> so they are largely different in size.
>> >> >> >>
>> >> >> >> Signed-off-by: Ilya Maximets <[email protected] 
>> >> >> >> <mailto:[email protected]> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]>> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]>>> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]>> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]> <mailto:[email protected] 
>> >> >> >> <mailto:[email protected]>>>>>
>> >> >> >> ---
>> >> >> >>  ovsdb/ovsdb.c       |  5 ++++
>> >> >> >>  ovsdb/ovsdb.h       |  3 +++
>> >> >> >>  ovsdb/transaction.c | 56 
>> >> >> >> +++++++++++++++++++++++++++++++++++++++++++--
>> >> >> >>  3 files changed, 62 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> <snip>
>> >> >>
>> >> >> >
>> >> >> > Thanks Ilya. This looks good to me. Although the atoms number is not 
>> >> >> > equal to the size of the data, it should be sufficient for the 
>> >> >> > purpose of controlling history size.
>> >> >> > It would be good to have a test case to ensure this behavior, and 
>> >> >> > also ensure the atoms count is as expected, just to avoid messing up 
>> >> >> > the counters by mistake in the future. (if that happens it is not 
>> >> >> > easily noticed because there is no functional difference but 
>> >> >> > possible performance impact of fast-resync)
>> >> >> >
>> >> >> > Acked-by: Han Zhou <[email protected] <mailto:[email protected]> 
>> >> >> > <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]>>> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> 
>> >> >> > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] 
>> >> >> > <mailto:[email protected]>>>>>
>> >> >>
>> >> >> Thanks for review!  I prepared a test below for the number of atoms.
>> >> >> If it looks good to you, I can fold it in before applying the patch.
>> >> >>
>> >> >> Han, Dumitru, What do you think?
>> >> >>
>> >> >> diff --git a/tests/ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>> 
>> >> >> b/tests/ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>>
>> >> >> index ac243d6a7..2b742f78b 100644
>> >> >> --- a/tests/ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>>
>> >> >> +++ b/tests/ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at> 
>> >> >> <http://ovsdb-server.at <http://ovsdb-server.at>>>
>> >> >> @@ -1228,6 +1228,69 @@ AT_CHECK([test $logged_updates -lt 
>> >> >> $logged_nonblock_updates])
>> >> >>  AT_CHECK_UNQUOTED([ovs-vsctl get open_vswitch . system_version], [0],
>> >> >>    [xyzzy$counter
>> >> >>  ])
>> >> >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> >> >> +AT_CLEANUP
>> >> >> +
>> >> >> +AT_SETUP([ovsdb-server transaction history size])
>> >> >> +on_exit 'kill `cat *.pid`'
>> >> >> +
>> >> >> +dnl Start an ovsdb-server with the clustered vswitchd schema.
>> >> >> +AT_CHECK([ovsdb-tool create-cluster db dnl
>> >> >> +            $abs_top_srcdir/vswitchd/vswitch.ovsschema unix:s1.raft],
>> >> >> +         [0], [ignore], [ignore])
>> >> >> +AT_CHECK([ovsdb-server --detach --no-chdir --pidfile dnl
>> >> >> +            --log-file --remote=punix:db.sock db],
>> >> >> +         [0], [ignore], [ignore])
>> >> >> +AT_CHECK([ovs-vsctl --no-wait init])
>> >> >> +
>> >> >> +dnl Create a bridge with N ports per transaction.  Increase N every 4
>> >> >> +dnl iterations.  And then remove the bridges.  By increasing the size 
>> >> >> of
>> >> >> +dnl transactions, ensuring that they take up a significant percentage 
>> >> >> of
>> >> >> +dnl the total database size, so the transaction history will not be 
>> >> >> able
>> >> >> +dnl to hold all of them.
>> >> >> +dnl
>> >> >> +dnl The test verifies that the number of atoms in the transaction 
>> >> >> history
>> >> >> +dnl is always less than the number of atoms in the database.
>> >> >> +get_n_atoms () {
>> >> >> +    n=$(ovs-appctl -t ovsdb-server memory/show dnl
>> >> >> +            | tr ' ' '\n' | grep atoms | grep "^$1:" | cut -d ':' -f 
>> >> >> 2)
>> >> >> +    if test X"$n" == "X"; then
>> >> >> +        n=0
>> >> >> +    fi
>> >> >> +    echo $n
>> >> >> +}
>> >> >> +
>> >> >> +check_atoms () {
>> >> >> +    n_db_atoms=$(get_n_atoms atoms)
>> >> >> +    n_txn_history_atoms=$(get_n_atoms txn-history-atoms)
>> >> >> +    echo "n_db_atoms:          $n_db_atoms"
>> >> >> +    echo "n_txn_history_atoms: $n_txn_history_atoms"
>> >> >> +    AT_CHECK([test $n_txn_history_atoms -le $n_db_atoms])
>> >> >> +}
>> >> >> +
>> >> >> +add_ports () {
>> >> >> +    for j in $(seq 1 $2); do
>> >> >> +        printf " -- add-port br$1 p$1-%d" $j
>> >> >> +    done
>> >> >> +}
>> >> >> +
>> >> >> +initial_db_atoms=$(get_n_atoms atoms)
>> >> >> +
>> >> >> +for i in $(seq 1 100); do
>> >> >> +    cmd=$(add_ports $i $(($i / 4 + 1)))
>> >> >> +    AT_CHECK([ovs-vsctl --no-wait add-br br$i $cmd])
>> >> >> +    check_atoms
>> >> >> +done
>> >> >> +
>> >> >> +for i in $(seq 1 100); do
>> >> >> +    AT_CHECK([ovs-vsctl --no-wait del-br br$i])
>> >> >> +    check_atoms
>> >> >> +done
>> >> >> +
>> >> >
>> >> > Thanks Ilya for the test case. I noticed that it seems the case when 
>> >> > txn_history is bigger than the data is not covered by the above loops. 
>> >> > When i increases from 1 to 100, the history is smaller or equal to the 
>> >> > whole data. Then when the first deletion happens, because there are max 
>> >> > 100 entries in the history, so the oldest history is removed, which is 
>> >> > the same size as the new transaction, because it is deleting the same 
>> >> > br that was added in the oldest transaction, and the txn history size 
>> >> > is still smaller or equal to the whole data items, and so on ... And it 
>> >> > seems that simply changing the loop count to 99 or less would have the 
>> >> > scenario covered.
>> >>
>> >> Hmm.  Deletions are happening in the same order, i.e. small bridges
>> >> first.  And here is how the last addition and the first deletion looks
>> >> like when I ran the test on my machine (I added an extra call for
>> >> memory/show to print actual history size):
>> >>
>> >> # make -j8 check TESTSUITEFLAGS='-k history -v'
>> >> <...>
>> >> ./ovsdb-server.at:1281 <http://ovsdb-server.at:1281> 
>> >> <http://ovsdb-server.at:1281 <http://ovsdb-server.at:1281>>: ovs-vsctl 
>> >> --no-wait add-br br$i $cmd
>> >> atoms:26572 cells:89364 monitors:0 raft-log:102 txn-history:49 
>> >> txn-history-atoms:26479
>> >> n_db_atoms:          26572
>> >> n_txn_history_atoms: 26479
>> >> ./ovsdb-server.at:1268 <http://ovsdb-server.at:1268> 
>> >> <http://ovsdb-server.at:1268 <http://ovsdb-server.at:1268>>: test 
>> >> $n_txn_history_atoms -le $n_db_atoms
>> >> ./ovsdb-server.at:1287 <http://ovsdb-server.at:1287> 
>> >> <http://ovsdb-server.at:1287 <http://ovsdb-server.at:1287>>: ovs-vsctl 
>> >> --no-wait del-br br$i
>> >> atoms:26527 cells:89218 monitors:0 raft-log:103 txn-history:49 
>> >> txn-history-atoms:26341
>> >> n_db_atoms:          26527
>> >> n_txn_history_atoms: 26341
>> >> <...>
>> >>
>> >> So, the history doesn't grow larger than 49 entries in that case.
>> >> Later, due to deletion of small bridges, history grows up to 55 entries
>> >> and gradually goes down after that with removal of larger bridges.
>> >>
>> >> Is it different in you setup?  Or did I misunderstand?
>> >
>> > Ok, I ran it myself and it turned out I was wrong. I was expecting that 
>> > the atoms in the transaction history that adds a br and its ports should 
>> > be the same as the total atoms increased in the DB by this transaction, 
>> > but it turned out that the transaction history increases more than what's 
>> > increased in the DB. So my assumption "When i increases from 1 to 100, the 
>> > history is smaller or equal to the whole data" was wrong. Did you happen 
>> > to know why the txn has more than what's added to the DB?
>>
>> Yep.  Transaction basically holds 2x of some of the
>> database data, because it holds both old and new versions
>> of rows.  Addition of a bridge here is not purely addition,
>> because ovs-vsctl will also add this new bridge to the
>> 'bridges' column of the row in 'Open_vSwitch' table.
>> And it will also change the value of the 'next_cfg' column
>> in that row.  These are mutations that require having
>> both old and new versions of a row in the transaction.
>>
>> So, each transaction will hold old and new versions of the
>> single row from 'Open_vSwitch' table along with all the new
>> data (bridge itself and ports).  Hence it will be a bit
>> larger than the added new data.
> 
> OK. Sorry that I forgot that the bridges column will be updated.
> Thanks for the explanation!

No problem.  I had to look this up myself, as it's not obvious.

Thanks for review!  Applied.

Best regards, Ilya Maximets.

> 
> Han
> 
>>
>> >
>> >>
>> >> >
>> >> >> +dnl After removing all the bridges, the number of atoms in the 
>> >> >> database
>> >> >> +dnl should return to its initial value.
>> >> >> +AT_CHECK([test $(get_n_atoms atoms) -eq $initial_db_atoms])
>> >> >> +
>> >> >>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> >> >>  AT_CLEANUP
>> >> >>
>> >> >> ---
>> >>
>>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to