[Bug 8201] Make some tests that fail occasionally more robust

2024-03-14 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

Sidney Markowitz  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from Sidney Markowitz  ---
Since I last looked at this, Net::DNS has had several updates for which I
worked with the developer to help test and resolve issues related to Windows.
Also, we did some changes to handle truncated large UDP DNS packets and
implemented TCP retry. I'm still seeing some flaky behavior on Windows test
machines on CPAN, but 1) they are not the same tests failing, so it does look
like random dropped packets or timeouts, not simple bugs in the code; and 2) I
discovered that what the test reports show for the version of Windows are not
accurate, only showing the version of Windows that perl was built on. Looking
deeper, I found almost all the test machines are running Windows 7, 8, or 8.1.
Given that, I'm not going to try to get the test machines to pass 100% of the
tests.

With that, I close this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-07 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #19 from Sidney Markowitz  ---
I've uploaded a test build that will use 1.1.1.1 for the nameserver on Windows
for those four tests. It isn't a correct solution, but it will at least check
if the problem has something to do with the nameserver on the local subnet that
those Windows test machines are configured to use. If that works, perhaps we
need to figure out how to retry on a different nameserver after receiving an
error result.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #18 from Sidney Markowitz  ---
(In reply to Henrik Krohns from comment #16)
> (In reply to Sidney Markowitz from comment #12)
> > That should behave the same unless a DNS query gets an error, in which case
> > it will try the query again on 1.1.1.1 and then 8.8.8.8.
> 
> SERVFAIL is a legit response from a DNS server, so Net::DNS will not retry
> other nameservers if it happens.

I was counting on the attempts in sub bgsend in DnsResolver.pm to do it, but
looking at the code again it looks like it just tries the nameserverts until it
successfully sends a query, with nothing that retries if a successful query
gets back a SERVFAIL response :(

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #17 from Henrik Krohns  ---
(In reply to Henrik Krohns from comment #16)
> (In reply to Sidney Markowitz from comment #12)
> > That should behave the same unless a DNS query gets an error, in which case
> > it will try the query again on 1.1.1.1 and then 8.8.8.8.
> 
> SERVFAIL is a legit response from a DNS server, so Net::DNS will not retry
> other nameservers if it happens.

Actually a raw Net::DNS test code with multiple nameservers does retry, but
current SA code doesn't, need to look why..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #16 from Henrik Krohns  ---
(In reply to Sidney Markowitz from comment #12)
> That should behave the same unless a DNS query gets an error, in which case
> it will try the query again on 1.1.1.1 and then 8.8.8.8.

SERVFAIL is a legit response from a DNS server, so Net::DNS will not retry
other nameservers if it happens.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #15 from Henrik Krohns  ---
Never mind, packet loss seems to only happen from one of my servers *shrug*.
Doesn't mean someone else might not have the problem, but strange that it would
only affect Windows.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #14 from Henrik Krohns  ---

Lots of packet loss from where I'm testing

--- b.auth-ns.sonic.net ping statistics ---
237 packets transmitted, 218 received, 8.01688% packet loss, time 236632ms
rtt min/avg/max/mdev = 133.476/162.024/165.514/6.157 ms

--- c.auth-ns.sonic.net ping statistics ---
227 packets transmitted, 213 received, 6.1674% packet loss, time 226483ms
rtt min/avg/max/mdev = 92.684/120.636/125.066/6.691 ms

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #13 from Henrik Krohns  ---
I'm wondering if it's really a problem with our spamassassin.org DNS infra,
more specifically sonic.net:

a.auth-ns.sonic.net
b.auth-ns.sonic.net
c.auth-ns.sonic.net

While testing queries directly to them, I'm seeing timeouts often, this could
well be generating those SERVFAILs.

Never see timeouts on ns2.pccc.com or ns2.ena.com.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #12 from Sidney Markowitz  ---
I'm going to try the following in a test build. I'll add this function to
t/SATest.pm

# Some Windows machines get excessive SERVFAIL responses from
# their configured nameserver during our tests
sub nameservers_for_safer_use {
  my $nsprefs = '';
  if ($RUNNING_ON_WINDOWS && can_use_net_dns_safely()) {
my $resolver = Net::DNS::Resolver->new;
my @nameservers = $resolver->nameservers;
foreach my $ns (@nameservers) {
  $nsprefs .= "dns_server $ns\n";
}
$nsprefs .= q(
dns_server 1.1.1.1
dns_server 8.8.8.8
);
  }
  return $nsprefs;
}

and in the four tests that are having problems I'll add the result of calling
that to tstprefs.

That should behave the same unless a DNS query gets an error, in which case it
will try the query again on 1.1.1.1 and then 8.8.8.8.

If the Windows local subnet DNS server is the cause of the SRVFAIL errors, that
should fix it. Or even if the problem is in the Windows client, the problem is
sporadic, so this gets it to try again with a different server.

Any opinions on committing this if it works in the tests?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-06 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #11 from Sidney Markowitz  ---
(In reply to Henrik Krohns from comment #5)
> Are there full debug logs available?

dnsbl.t doesn't show anything useful in the logs, but hashbl.t has significant
-D output in its logs. I instrumented a test build to dump the log when
hashbl.t fails and have two examples with that.

Henrik, can you see anything in the logs for hashbl.t in these two reports that
give a hint as to what is going wrong? Scroll down to "Output can be examined
in: log\hashbl." to see it

https://www.cpantesters.org/cpan/report/2d0a21aa-7212-1014-8844-ea7dcb952333

https://www.cpantesters.org/cpan/report/71f80529-720f-1014-b69b-ba45cb952333

I see some SERVFAIL replies in the DNS queries but I don't know how to
interpret them exactly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-04 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #10 from Sidney Markowitz  ---
I instrumented dnsbl.t to dump the log files on failure. I didn't try for
VERBOSE=1 to see -D output, as that is more massive than I want to have all the
test machines produce when the test doesn't fail. What I could see from the log
files on failed tests was consistent with occasional dns query responses being
dropped. I have a suspicion that some of the DNS servers are using EDNS0 jumbo
UDP packets and that the Windows machines are not handling them well. The
changelist for Net::DNS 1.41 that recently was released to fix the problem with
Windows we saw since version 1.38 also mentions a fix to inbound jumbo UDP
packets.

I am trying a change now that makes Net::DNS 1.41 the minimum required when
running on Windows. After that has time to be picked up and run on the Windows
test machines on CPAN I'll try another test that when running on Windows adds
dns_options noedns to the various tests that have been failing with this. I've
already verified that in the VM network configuration I had which failed
hashbl.t with the bgread errors, setting dns_options noedns makes those errors
go away.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-03 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #9 from Sidney Markowitz  ---
I'm wrong about it being restricted to perl version less than 5.26. I see the
same failure on 5.28, and I haven't yet seen a Windows test machine on a higher
version of pel on the latest test builds.

I'm going to put some more effort into figuring out just what is happening with
dnsbl.t when it fails.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #8 from Sidney Markowitz  ---
I've never seen basic_meta2.t fail like this, I just included it because I
looked for all tests where you put the iterations, not understanding that it
had to do with randomized order of rule evaluation rather than network effects.

I'm leaning much more to deprecating perl less than 5.26 on Windows with a
warning in Makefile.PL. Or maybe even make the minimum version of perl 5.26
when on Windows? That will prevent CPAN test machines running old windows perl
from testing SpamAssassin. It would have been better to decide that as part of
the 4.0.0 major upgrade, but realistically nobody still running SpamAssassin on
Windows would be unable to upgrade their perl to install 4.0.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #7 from Henrik Krohns  ---
(In reply to Sidney Markowitz from comment #6)
> 
> I wonder if the right thing is to put something in Makefile.PL that checks
> for Windows and perl version less than 5.26 and outputs a warning about
> network performance may not be reliable on Windows perl older than version
> 5.26.

I'm not against that, but I don't have any interest on the Windows version
anyway. :-)

PS. basic_meta2.t does not have any network lookups. If that fails there's
something really wrong.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #6 from Sidney Markowitz  ---
(In reply to Henrik Krohns from comment #5)
> Are there full debug logs available?

Unfortunately, no. The test machine submits the report that you see at the link
and there is no way to query the test machine for more information about its
configuration or any more detail about the test in the report.

I wonder if the right thing is to put something in Makefile.PL that checks for
Windows and perl version less than 5.26 and outputs a warning about network
performance may not be reliable on Windows perl older than version 5.26.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #5 from Henrik Krohns  ---
(In reply to Sidney Markowitz from comment #3)
> (In reply to Henrik Krohns from comment #2)
> 
> Do you have any insight about what to look for to track this down?

Are there full debug logs available?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #4 from Sidney Markowitz  ---
% svn ci -m "bug 8201 Revert previous commit" t/basic_meta2.t t/dnsbl.t
t/hashbl.t t/uribl.t 
Sendingt/basic_meta2.t
Sendingt/dnsbl.t
Sendingt/hashbl.t
Sendingt/uribl.t
Transmitting file data done
Committing transaction...
Committed revision 1914293.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

--- Comment #3 from Sidney Markowitz  ---
(In reply to Henrik Krohns from comment #2)

Whoops. I'll revert the commit.

I'm not sure what to do about the failures, as I can't reproduce them but they
do appear often in the CPAN tests. The common characteristics are that they are
on Strawberry Perl version 5.24 and older, which of curse is on Windows
platform. I think that platform has something a bit flaky about handling
network things and/or asynchronous tasks.

See the hashbl.t failure in
https://www.cpantesters.org/cpan/report/66374a07-6cd0-1014-868a-16c8e3396204

It appears that almost all the time I see these errors they are in the first
iteration. I tested running the sarun once in a 0th iteration without checkoing
patterns and that seemed to make such errors not happen. However, without
understanding why that works I don't feel good about just adding such a thing
to the test.

Do you have any insight about what to look for to track this down?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

Henrik Krohns  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 CC||apa...@hege.li
 Resolution|FIXED   |---

--- Comment #2 from Henrik Krohns  ---
(In reply to Sidney Markowitz from comment #0)
> Created attachment 5920 [details]
> proposed patch
> 
> There are four tests that are written to iterate $iterations times, with
> explanation in the comment that it is to allow for random natural failures.
> I'm seeing such occasional failures showing up on CPAN testing machines 
> running Windows with versions of Strawberry perl 5.24 and older
> 
> basic_meta2.t dnsbl.t hashbl.t uribl.t
> 
> The problem with the current implementation is that a failure in any of the
> iterations is considered a test failure.

I think you may have misunderstood.

My code comment is: "run many times to catch some random natured failures"

And yours is "allow for random natural failures". This is completely different
thing, sorry if my comment was vague.

It's crucial that each and every test succeeds without failure, that's the
point of the iterations. Internal random running order of rules can affect the
outcome as was seen in the past. Why it happens on Windows should be
investigated.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

Sidney Markowitz  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Sidney Markowitz  ---
% svn ci -m "bug 8201 Only fail some tests that retry network tests if all
retries have any errors" t/basic_meta2.t t/dnsbl.t t/hashbl.t t/uribl.t
Sendingt/basic_meta2.t
Sendingt/dnsbl.t
Sendingt/hashbl.t
Sendingt/uribl.t
Transmitting file data done
Committing transaction...
Committed revision 1914291.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8201] Make some tests that fail occasionally more robust

2023-12-02 Thread bugzilla-daemon
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8201

Sidney Markowitz  changed:

   What|Removed |Added

 CC||sid...@sidney.com
   Target Milestone|Undefined   |4.0.1

-- 
You are receiving this mail because:
You are the assignee for the bug.