Ilya Maximets <[email protected]> writes: > On 5/4/23 19:21, Paolo Valerio wrote: >> Ilya Maximets <[email protected]> writes: >> >>> On 4/19/23 20:40, Paolo Valerio wrote: >>>> During the creation of a new connection, there's a chance both key and >>>> rev_key end up having the same hash. This is more common in the case >>>> of all-zero snat with no collisions. In that case, once the >>>> connection is expired, but not cleaned up, if a new packet with the >>>> same 5-tuple is received, an assertion failure gets triggered in >>>> conn_update_state() because of a previous failure of retrieving a >>>> CT_CONN_TYPE_DEFAULT connection. >>>> >>>> Fix it by releasing the nat_conn during the connection creation in the >>>> case of same hash for both key and rev_key. >>> >>> This sounds a bit odd. Shouldn't we treat hash collision as a normal case? >>> >>> Looking at the code, I'm assuming that the issue comes from the following >>> part in process_one(): >>> >>> if (OVS_LIKELY(conn)) { >>> if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { >>> ... >>> conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); >>> >>> And here we get the same connection again, because the default one is >>> already >>> expired. Is that correct? >>> >>> If so, maybe we should add an extra condition to conn_key_lookup() to >>> only look for DEFAULT connections instead, just for this case? Since >>> we really don't want to get the UN_NAT one here. >>> >> >> Hello Ilya, >> >> It's a fair point. >> I initially thought about the approach you're suggesting, but I had some >> concerns about it that I'll try to summarize below. >> >> For sure it would fix the issue (it could require the first patch to be >> applied as well for the branches with rcu exp lists). >> >> Based on the current logic, new packets matching that expired connection >> but not evicted will be marked as +inv and further packets will be >> marked so for the whole sweep interval unless an exception like this get >> added: >> >> uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); >> /* the last flag indicates CT_CONN_TYPE_DEFAULT only */ >> conn_key_lookup_(ct, &ctx->key, hash, now, &conn, &ctx->reply, true); >> /* special case where there's hash collision */ >> if (!conn && ctx->hash != hash) { >> pkt->md.ct_state |= CS_INVALID; >> write_ct_md(pkt, zone, NULL, NULL, NULL); >> ... >> return; >> } >> >> This would further require that subsequent lookup in the create_new_conn >> path are restricted to CT_CONN_TYPE_DEFAULT, e.g.: >> >> uint32_t hash = conn_key_hash(&ctx->key, ct->hash_basis); >> /* Only check for CT_CONN_TYPE_DEFAULT */ >> if (!conn_key_lookup_(ct, &ctx->key, hash, now, NULL, NULL, true)) { >> conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, >> helper, alg_exp, ct_alg_ctl, tp_id); >> } >> >> otherwise we could incur in a false positive which prevent to create a >> new connection. > > I'm not really sure if what described above is more correct way of doing > things or not... Aaron, do you have opinion on this? > > Another thought: Can we expire the CT_CONN_TYPE_UN_NAT connection the > moment DEFAULT counterpart of it expires? Or that will that be against > some logic / not possible to do? >
As far as I can tell, this could not be straightforward as simply marking it as expired should not be reliable (e.g. doing it from the sweeper), and I guess that managing the expiration time field for the nat_conn as well would require updating the nat_conn every time the default one gets updated, probably making it a bit unpractical. Another approach would be removing the nat_conn [1] altogether. The problem in this case is backporting. Some adjustments that would add to the patch might be needed for older branches. [1] https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > >> >>> Best regards, Ilya Maximets. >>> >>>> >>>> Reported-by: Michael Plato <[email protected]> >>>> Fixes: 61e48c2d1db2 ("conntrack: Handle SNAT with all-zero IP address.") >>>> Signed-off-by: Paolo Valerio <[email protected]> >>>> --- >>>> In this thread [0] there are some more details. A similar >>>> approach here could be to avoid to add the nat_conn to the cmap and >>>> letting the sweeper release the memory for nat_conn once the whole >>>> connection gets freed. >>>> That approach could still be ok, but the drawback is that it could >>>> require a different patch for older branches that don't include >>>> 3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with >>>> rculists."). It still worth to be considered. >>>> >>>> [0] >>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-April/052339.html >>>> --- >>>> lib/conntrack.c | 21 +++++++++++++-------- >>>> 1 file changed, 13 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/lib/conntrack.c b/lib/conntrack.c >>>> index 7e1fc4b1f..d2ee127d9 100644 >>>> --- a/lib/conntrack.c >>>> +++ b/lib/conntrack.c >>>> @@ -1007,14 +1007,19 @@ conn_not_found(struct conntrack *ct, struct >>>> dp_packet *pkt, >>>> } >>>> >>>> nat_packet(pkt, nc, false, ctx->icmp_related); >>>> - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); >>>> - memcpy(&nat_conn->rev_key, &nc->key, sizeof >>>> nat_conn->rev_key); >>>> - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>>> - nat_conn->nat_action = 0; >>>> - nat_conn->alg = NULL; >>>> - nat_conn->nat_conn = NULL; >>>> - uint32_t nat_hash = conn_key_hash(&nat_conn->key, >>>> ct->hash_basis); >>>> - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>>> + uint32_t nat_hash = conn_key_hash(&nc->rev_key, >>>> ct->hash_basis); >>>> + if (nat_hash != ctx->hash) { >>>> + memcpy(&nat_conn->key, &nc->rev_key, sizeof >>>> nat_conn->key); >>>> + memcpy(&nat_conn->rev_key, &nc->key, sizeof >>>> nat_conn->rev_key); >>>> + nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; >>>> + nat_conn->nat_action = 0; >>>> + nat_conn->alg = NULL; >>>> + nat_conn->nat_conn = NULL; >>>> + cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); >>>> + } else { >>>> + free(nat_conn); >>>> + nat_conn = NULL; >>>> + } >>>> } >>>> >>>> nc->nat_conn = nat_conn; >>>> >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
