Re: JenkinsRule tests sometimes timeout

2018-04-09 Thread Jacob Keller


On Monday, April 9, 2018 at 1:26:23 PM UTC-7, Jesse Glick wrote:
>
> On Mon, Apr 9, 2018 at 2:44 PM, Jacob Keller  > wrote: 
> > I discovered that it could take a significantly long 
> > time for the SSHD server to start. It turns out that JenkinsRule 
> defaults to 
> > the 1.651.2 jenkins-war which does not disable the server by default.  
>
`JenkinsRule` is _built_ against some older version of Jenkins core. 
> What you _run_ against is specified by `jenkins.version` in your POM. 
> Select something newish, and SSHD will not be started by default, and 
> your problem is solved. 
>
>
Ok, that fixes the cases for testing against newer versions of Jenkins. I 
think the git plugin still wants to point against older versions.
 

> > attempting to debug why the sshd-module can sometimes take forever. 
> > (It's possible it's platform/OS related?) 
>
> Sure, maybe. 


I'm not exactly sure how to do that, but I suppose I could at least add 
more logging statements to sshd-module and build a version of Jenkins that 
includes those. Ultimately, it won't resolve the issue if building against 
older versions (unless we backport such a fix to maintenance branches?)
 

> > modifying the JenkinsRule to not start the timeout countdown until after 
> > initialization 
>
> No, this would allow tests which genuinely hang in e.g. `@LocalData` 
> to never terminate. 


Ok this makes sense to avoid.

Thanks,
Jake

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/2ba7c950-e950-40f5-b1a6-097aa76c5cf7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: JenkinsRule tests sometimes timeout

2018-04-09 Thread Jesse Glick
On Mon, Apr 9, 2018 at 2:44 PM, Jacob Keller  wrote:
> I discovered that it could take a significantly long
> time for the SSHD server to start. It turns out that JenkinsRule defaults to
> the 1.651.2 jenkins-war which does not disable the server by default.

`JenkinsRule` is _built_ against some older version of Jenkins core.
What you _run_ against is specified by `jenkins.version` in your POM.
Select something newish, and SSHD will not be started by default, and
your problem is solved.

> attempting to debug why the sshd-module can sometimes take forever.
> (It's possible it's platform/OS related?)

Sure, maybe.

> modifying the JenkinsRule to not start the timeout countdown until after
> initialization

No, this would allow tests which genuinely hang in e.g. `@LocalData`
to never terminate.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0etuyQnSW6g9EsE93mRbQx1HkxcaFK9%2B%2BW7bCN3XuPDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


JenkinsRule tests sometimes timeout

2018-04-09 Thread Jacob Keller
Hi,

I've been noticing at least on my test machine, and in the jenkins tests 
run when I submit pull requests, that the JenkinsRule from 
jenkins-test-harness can timeout when running tests. It would run the tests 
multiple times, and sometimes they would pass and get marked as "flaky" 
while other times they would just fail. Essentially, something was causing 
them to take longer than the 180 second timeout. Interestingly, if I run 
the test cases 1 at a time, (i.e. with -Dtest=) the tests 
*always* pass, and do not take a long time. It is only when running the 
tests as a whole that some appear to timeout.

I tried extending the jenkins test timout system property to 10 minutes, 
and the failure went away, but the tests took a significantly longer time 
to run, so I started digging into the actual tests that were timing out to 
see what was going on. The actual content of the test was running 
incredibly fast, but the Jenkins initialization was taking too long and 
timing out before completion.

Further digging, and I discovered that it could take a significantly long 
time for the SSHD server to start. It turns out that JenkinsRule defaults 
to the 1.651.2 jenkins-war which does not disable the server by default. 
(This wasn't changed until later). I do not understand why the server can 
take so long to startup. I tried running tests with various values of 
--forkCount, and --resuseForks, which did not change the outcome.

Then, I dug into the actual JenkinsRule code, and tried to figure out if we 
could just disable the SSHD server (since newer versions of the jenkins do 
this by default anyways). Updating jenkins-test-harness to target the 
latest 2.x version did not work, and probably would cause difficulty in 
testing plugins wishing to support older versions.

Directly calling SSHD.get().setPort() doesn't work, because we need to 
change the default port value prior to launching the server, as otherwise 
we still pay the initialization cost.

I settled on adding a PresetData which has the configuration file for the 
SSHD module in place. This works ok, but requires putting @PresetData for 
every test. Additionally, I couldn't figure out any way to get @PresetData 
to work with @ClassRule.

To ease testing, I modified the JenkinsRule to simply force the correct 
HomeLoader, instead of using an empty directory.

Once I did so, it resolved the issues in the Git plugin, and made the tests 
more reliable.

I'm wondering if there's any better ideas for how to solve this? It's very 
annoying to have some tests arbitrarily fail due to this timeout, 
especially when we don't need the SSHD server for these test cases. It's 
very troublesome, as in some cases, it can cause the automated bot that 
tests github pull requests to report test failures, even though they are 
actually fine and it's just the timeout bug that caused the failure. 
Sometimes it passes because the tests just get marked as "flaky" after 
running a few times, but other times all 5 runs fail.

It's even worse when debugging locally, since it's very weird to have test 
failures when running mvn test, vs when running mvn test -Dtest=

Other areas I haven't explored:

(1) modifying the jenkins-war to simply excise the sshd-module entirely so 
that it doesn't even try to load (would cause problems for any test case 
that actually requires the SSHD to start).
(2) attempting to debug why the sshd-module can sometimes take forever. 
(It's possible it's platform/OS related?)
(3) modifying the JenkinsRule to not start the timeout countdown until 
after initialization (would fix the test failure, but ultimately leaves the 
tests taking a significantly longer time to run)

Thanks,
Jake

More information on the root cause so far is 
at https://issues.jenkins-ci.org/browse/JENKINS-50642

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/187d9fad-74a1-4f97-a419-75c0be0e4c4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.