Thanks for the comments. Got it. I updated the implementation & sent a new
patch. Please let me know if there is any additional changes needed.

--Mehrdad

On Wed, Feb 4, 2026 at 4:33 AM Dumitru Ceara <[email protected]> wrote:

> On 2/4/26 8:30 AM, Ales Musil wrote:
> > On Tue, Feb 3, 2026 at 8:40 PM Mehrdad Moradi via dev <
> > [email protected]> wrote:
> >
> >> When ovn-controller returns DNS responses with multiple IP addresses,
> >> clients typically use the first IP in the list. Without randomization,
> >> all clients direct traffic to the same backend, leading to uneven load
> >> distribution.
> >>
> >> This patch implements Fisher-Yates shuffle on IPv4 and IPv6 address
> >> arrays before building DNS answers, ensuring each query returns IPs
> >> in a randomized order for better traffic distribution across backends.
> >>
> >> Signed-off-by: Mehrdad Moradi <[email protected]>
> >> ---
> >>
> >
> > Hi Mehrdad,
> >
>
> Hi Mehrdad, Ales,
>
> > I think there might have been slight misunderstanding.
> > Dumitru suggested two things, generic shuffle and shuffled
> > indices. If we are going the way of shuffled indices we don't
> > need the generic shuffle. So I would put the shuffle into the
> > shuffle_indices() function directly. I have also one more comment
> > down below.
> >
>
> Yes, that's what I meant, sorry if it wasn't too clear.
>
> >  controller/pinctrl.c | 49 ++++++++++++++++++++++++++++++++++++--
> >>  tests/ovn.at         | 56 +++++++++++++++++++++++++++++++++++++++-----
> >>  2 files changed, 97 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> >> index ab8a0a37c..25a4de6a2 100644
> >> --- a/controller/pinctrl.c
> >> +++ b/controller/pinctrl.c
> >> @@ -3365,6 +3365,44 @@ dns_build_ptr_answer(
> >>      free(encoded);
> >>  }
> >>
> >> +/* Shuffle an array in-place using Fisher-Yates algorithm. */
> >> +static void
> >> +array_shuffle(void *array, size_t elem_size, size_t n)
> >> +{
> >> +    if (n <= 1) {
> >> +        return;
> >> +    }
> >> +
> >> +    uint8_t *arr = array;
> >> +    for (size_t i = n - 1; i > 0; i--) {
> >> +        size_t j = random_range(i + 1);
> >> +        uint8_t *a = arr + i * elem_size;
> >> +        uint8_t *b = arr + j * elem_size;
> >> +        for (size_t k = 0; k < elem_size; k++) {
> >> +            uint8_t tmp = a[k];
> >> +            a[k] = b[k];
> >> +            b[k] = tmp;
> >> +        }
>
> Swapping byte by byte is not really the right way to do it.  But if we
> move it inside the "shuffle_indices" function we don't need to do this
> anymore.  We can just use a size_t temporary instead.
>
> >> +    }
> >> +}
> >> +
> >> +/* Return an array of shuffled indices [0, n-1] for DNS round-robin.
>
> This is a generic function, it just creates a nicely randomized
> permutation of indices.  It shouldn't mention DNS.  It could also go in
> ovn-util.[ch] while we're at it.
>
> >> + * Caller must free the returned array. Returns NULL if n == 0. */
> >> +static size_t *
> >> +shuffle_indices(size_t n)
> >> +{
> >> +    if (!n) {
> >> +        return NULL;
> >> +    }
> >> +
> >> +    size_t *indices = xmalloc(n * sizeof *indices);
> >> +    for (size_t i = 0; i < n; i++) {
> >> +        indices[i] = i;
> >> +    }
> >> +    array_shuffle(indices, sizeof *indices, n);
> >>
> >
> > As said above, let's put the shuffle loop directly here.
> >
> > +    return indices;
> >> +}
> >> +
> >>  #define DNS_RCODE_SERVER_REFUSE 0x5
> >>  #define DNS_QUERY_TYPE_CLASS_LEN (2 * sizeof(ovs_be16))
> >>
> >> @@ -3530,11 +3568,15 @@ pinctrl_handle_dns_lookup(
> >>              goto exit;
> >>          }
> >>
> >> +        /* Shuffle indices for round-robin load balancing. */
> >> +        size_t *ipv4_order = shuffle_indices(ip_addrs.n_ipv4_addrs);
> >> +        size_t *ipv6_order = shuffle_indices(ip_addrs.n_ipv6_addrs);
> >> +
> >>          if (query_type == DNS_QUERY_TYPE_A ||
> >>              query_type == DNS_QUERY_TYPE_ANY) {
> >>              for (size_t i = 0; i < ip_addrs.n_ipv4_addrs; i++) {
> >>                  dns_build_a_answer(&dns_answer, in_queryname, idx,
> >> -                                   ip_addrs.ipv4_addrs[i].addr);
> >> +
> >>  ip_addrs.ipv4_addrs[ipv4_order[i]].addr);
> >>
> >
> > nit: This is a bit hard to read, I would make the addr separate variable.
> >
> >                  ancount++;
> >>              }
> >>          }
> >> @@ -3543,11 +3585,14 @@ pinctrl_handle_dns_lookup(
> >>              query_type == DNS_QUERY_TYPE_ANY) {
> >>              for (size_t i = 0; i < ip_addrs.n_ipv6_addrs; i++) {
> >>                  dns_build_aaaa_answer(&dns_answer, in_queryname, idx,
> >> -                                      &ip_addrs.ipv6_addrs[i].addr);
> >> +
> >> &ip_addrs.ipv6_addrs[ipv6_order[i]].addr);
> >>
> >
> > nit: Same here.
> >
> >
> >>                  ancount++;
> >>              }
> >>          }
> >>
> >> +        free(ipv4_order);
> >> +        free(ipv6_order);
> >> +
> >>          /* DNS is configured with a record for this domain with
> >>           * an IPv4/IPV6 only, so instead of ignoring this A/AAAA query,
> >>           * we can reply with  RCODE = 5 (server refuses) and that
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index 4d15d4a62..d5e51c60b 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -11636,6 +11636,7 @@ set_dns_params() {
> >>      local ttl=00000e10
> >>      an_count=0001
> >>      type=0001
> >> +    expected_dns_answer_alt=""
> >>      case $hname in
> >>      vm1)
> >>          # vm1.ovn.org
> >> @@ -11652,10 +11653,13 @@ set_dns_params() {
> >>      vm2)
> >>          # vm2.ovn.org
> >>          query_name=03766d32036f766e036f726700
> >> +        # IPv4 addresses may be returned in any order due to DNS
> >> round-robin.
> >>          # IPv4 address - 10.0.0.6
> >> -        expected_dns_answer=${query_name}00010001${ttl}00040a000006
> >> +        local ip1=${query_name}00010001${ttl}00040a000006
> >>          # IPv4 address - 20.0.0.4
> >> -
> >>
> expected_dns_answer=${expected_dns_answer}${query_name}00010001${ttl}000414000004
> >> +        local ip2=${query_name}00010001${ttl}000414000004
> >> +        expected_dns_answer=${ip1}${ip2}
> >> +        expected_dns_answer_alt=${ip2}${ip1}
> >>          an_count=0002
> >>          ;;
> >>      vm3)
> >> @@ -11712,6 +11716,12 @@ set_dns_params() {
> >>      local dns_resp_header=010281200001${an_count}00000000
> >>      dns_req_data=${dns_req_header}${query_name}${type}0001
> >>
> >>
> dns_resp_data=${dns_resp_header}${query_name}${type}0001${expected_dns_answer}
> >> +    # Alternative response for cases with multiple IPs (DNS
> round-robin).
> >> +    if test -n "$expected_dns_answer_alt"; then
> >> +
> >>
> dns_resp_data_alt=${dns_resp_header}${query_name}${type}0001${expected_dns_answer_alt}
> >> +    else
> >> +        dns_resp_data_alt=""
> >> +    fi
> >>  }
> >>
> >>  # This shell function sends a DNS request packet
> >> @@ -11789,17 +11799,51 @@ src_ip=`ip_to_hex 10 0 0 4`
> >>  dst_ip=`ip_to_hex 10 0 0 1`
> >>  dns_reply=1
> >>  test_dns 1 f00000000001 f000000000f0 $src_ip $dst_ip $dns_reply
> >> $dns_req_data $dns_resp_data
> >> +# Generate alternative expected response for DNS round-robin (without
> >> sending packet).
> >> +if test -n "$dns_resp_data_alt"; then
> >> +    # Build expected reply packet with alternative IP ordering.
> >> +    ip_len_val=`expr 28 + ${#dns_resp_data_alt} / 2`
> >> +    udp_len_val=`expr $ip_len_val - 20`
> >> +    ip_len_hex=$(printf "%x" $ip_len_val)
> >> +    udp_len_hex=$(printf "%x" $udp_len_val)
> >> +
> >>
> reply_alt=f00000000001f000000000f00800450000${ip_len_hex}0000000080110000
> >> +
> >>
> reply_alt=${reply_alt}${dst_ip}${src_ip}0035923400${udp_len_hex}0000${dns_resp_data_alt}
> >> +    echo $reply_alt > 1.expected.alt
> >> +fi
> >>
> >>  # NXT_RESUMEs should be 1.
> >>  OVS_WAIT_UNTIL([test 1 = `cat ofctl_monitor*.log | grep -c
> NXT_RESUME`])
> >>
> >> -OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [1.expected], ["cut -c -48"])
> >> -# Skipping the IPv4 checksum.
> >> -OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [1.expected], ["cut -c 53-"])
> >> +# Check packets - for vm2, IPs may be in either order due to DNS
> >> round-robin.
> >> +if test -f 1.expected.alt; then
> >> +    $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap >
> >> 1.received
> >> +    cut -c 53- 1.expected > 1.expected.tail
> >> +    cut -c 53- 1.expected.alt > 1.expected.alt.tail
> >> +    cut -c 53- 1.received > 1.received.tail
> >> +    # Accept either ordering of IPs.
> >> +    if diff -q 1.expected.tail 1.received.tail > /dev/null 2>&1 || \
> >> +       diff -q 1.expected.alt.tail 1.received.tail > /dev/null 2>&1;
> then
> >> +        :
> >> +    else
> >> +        echo "Expected (order 1):" && cat 1.expected.tail
> >> +        echo "Expected (order 2):" && cat 1.expected.alt.tail
> >> +        echo "Received:" && cat 1.received.tail
> >> +        AT_FAIL_IF([true])
> >> +    fi
> >> +    # Check header (first 48 chars).
> >> +    cut -c -48 1.expected > 1.expected.head
> >> +    cut -c -48 1.received > 1.received.head
> >> +    AT_CHECK([diff 1.expected.head 1.received.head])
> >> +else
> >> +    OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [1.expected], ["cut -c -48"])
> >> +    # Skipping the IPv4 checksum.
> >> +    OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [1.expected], ["cut -c 53-"])
> >> +fi
> >>
> >>  reset_pcap_file hv1-vif1 hv1/vif1
> >>  reset_pcap_file hv1-vif2 hv1/vif2
> >> -rm -f 1.expected
> >> +rm -f 1.expected 1.expected.alt 1.received
> >> +rm -f 1.expected.head 1.expected.tail 1.expected.alt.tail
> 1.received.head
> >> 1.received.tail
> >>  rm -f 2.expected
> >>
> >>
> >> --
> >> 2.52.0
> >>
> >> _______________________________________________
> >> dev mailing list
> >> [email protected]
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >>
> > If Dumitru agrees I can make the changes during merge.
> >
>
> At this point, I think it might be clearer if Mehrdad posts a v3
> addressing all remaining comments.  We're not in a hurry to get this
> merged, we already agreed to accept it in 26.03.
>
> Regards,
> Dumitru
>
> > Regards,
> > Ales
> >
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to