Wanted to expand and sort of answer the questions I asked here.
- One solution, which is probably the best solution, is to dynamically
add more A and B resources. If you can spin up new resources then you're
guaranteed to have online, not running nodes which Jenkins can then just
take to use in the pipeline. For me, though, nodes A and B are actual
hardware so they aren't something I can dynamically add or remove, per se.
- Another is to put a timeout around the whole pipeline so it's always
capped. Then, when the node goes offline, the pipeline won't wait
indefinitely and will die once the timeout is reached.
- The way I'm going to do it will be to have a while loop and to add or
remove any online or offline nodes every loop, respectively, and check if
there are ANY nodes that are busy (running). If there are any busy nodes
then I'll just wait for some time and attempt the loop again. I'll probably
cap it at some max runtime, too.
- Another way to do it, I think, is to do as above and use a while loop
but instead of waiting for ALL resources to be available you could probably
just create the branches map for any resources that are available (online
and not running) at the time of the loop and just run those with parallel
repeatedly until there is nothing left to run. You'd have to wait for each
parallel to complete in this case, though, before you could attempt the
next set of parallel runs (as far as I know).
On Monday, February 12, 2018 at 7:06:39 PM UTC-7, Brownjay wrote:
>
> Hi,
>
> I feel like this is certainly a situation someone has run into before but
> I haven't been able to think up a non-trivial solution for my use case:
>
>- Lets say I have two pipelines, P1 and P2, that do stuff and both
>need the same two executors/slaves/nodes A and B
>- P1 enters the queue first and takes both A and B and starts doing
>it's stuff in parallel (using parallel)
>- Then P2 enters the queue while A and B are being used, still, by P1;
>A and B are still online but they're unavailable for P2 to use (you'd see
>it waiting for the executors to be available in the console, for example)
>- However, B fails and has the node taken offline on failure
>- P1 completes with A eventually passing and B failed and took the
>node offline. Perhaps, depending on timing, P2 started using A when it
>passed.
>- P2 is now waiting forever until node B is brought online again
>
> How can one check in the pipeline P2 that node B is offline and just break
> out?
>
> If the node was offline at the start of P2 then it's easy to check and
> exclude it. However, if B is online when P2 enters the queue and sets up
> the parallel runs of A and B and sits and waits for them to be available
> and one of them goes offline then how does the pipeline get notified it's
> offline and move on to do whatever? Doesn't seem to happen automatically
> and I can't figure out how to check inside a node block that itself is
> offline (B checking if it itself is offline).
>
> Here's a simple pipeline groovy script I made to help me figure out the
> issue:
>
> // Branches for parallel node runs
> def branches = [:]
> // Nodes. In real setup this would only contain nodes that are online at
> the time the pipeline runs
> def node_names = ["A", "B"]
> // Short sleep time of 15 seconds. Later, it'll get reduced to 5 if the
> node name is B
> def sleep_time = 15
>
> // Loop through the nodes and create the data in the branch list to run in
> parallel on at the end
> node_names.each { node_name ->
> println node_name
>
> branches["node_" + node_name] = {
> // Doing something like this doesn't do anything
> //if (!isNodeOnline(node_name)) {
> //println "node name " + node_name + " is offline, returning"
> //return
> //}
>
> node(node_name) {
> // If the node that is being looked at is B then set the sleep
> time to 5 so that it runs
> // a shorter time than A. Later, it's hardcoded to fail B and
> take it offline. This way
> // A stays in the queue running and B is done and offline.
> def temp_sleep_time = sleep_time
> if (node_name == "B") {
> temp_sleep_time = 5
> }
>
> timestamps {
> stage("pre-build") {
> println "Prebuilding " + node_name + "!"
> sleep time: temp_sleep_time, unit: 'SECONDS'
> println "Done with pre-build!"
> }
> stage("build") {
> println "Building " + node_name + "!"
> sleep time: temp_sleep_time, unit: 'SECONDS'
> println "Done with build!"
> }
> stage("post-build") {
> println "Post building " + node_name + "!