Re: [Bug 62318] healthcheck
> On Aug 24, 2018, at 1:18 PM, Eric Covener wrote: > > > I don't think it's hc execution time relatve to interval, it's hc > execution time relative to AP_WD_TM_SLICE. If it's much more than > AP_WD_TM_SLICE they'll stack up while one is running. > For a 1 second response you could queue up 10 in a second even if you > only have an interval of as minute or hour. Hmm... let me do some testing here... if that's the case, then it's borked for sure.
Re: [Bug 62318] healthcheck
Le 24/08/2018 à 19:08, Jim Jagielski a écrit : On Aug 24, 2018, at 12:53 PM, Christophe JAILLET wrote: I've only found it in mod_proxy_balancer and, IIUC, the meaning is "slightly" different from its use in hcheck! :) Looks like this 'updated' field was dedicated for recording the time a worker has been added. So, my understanding is that, either: - hcheck already changed the meaning of this field, and broke the API when it has been introduced. or - the API only says that 'updated' is "timestamp of last update", without telling which kind of update! So why couldn't be used by hcheck to keep record of the "timestamp of last update"... of its check? It's the latter... recall that health check workers are totally different and separate from real workers. I wasn't aware of that. So +1 for me. Maybe some code comment (if not already present), should clarify that? I still think that moving when s->updated is updated (sic!) in hcheck should be OK, and wouldn't be an API breakage for me. I don't thing that it can interfere in any way with mod_proxy_balancer, at least with the actual code. And we should clarify what is the use of thee fields to avoid someelse to 'steal' them. Let's stop w/ the idea that the API is broken or stolen :) But yeah, doing the update = now when started/queued makes sense, assuming that we understand the issue.
Re: [Bug 62318] healthcheck
On Fri, Aug 24, 2018 at 1:05 PM Jim Jagielski wrote: > > > > On Aug 24, 2018, at 12:05 PM, Eric Covener wrote: > > On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET > wrote: > > > Le 24/08/2018 à 16:40, Jim Jagielski a écrit : > > I was wondering if someone wanted to provide a sanity check > on the above PR and what's "expected" by the health check code. > > It would be very easy to adjust so that hcinterval was not > the time between successive checks but the interval between > the end of one and the start of another, but I'm not sure that > is as useful. In other words, I think the current behavior > is right (but think the docs need to be updated), but am > willing to have my mind changed :) > > Hi Jim, > > the current behavior is also what I would expect. > If I configure a check every 10s, I would expect 6 checks each minute, > even if the test itself takes time to perform. > > > > Bug describes something else IIUC. Because the watchdog calls us 10 > times per second, it continuously sees that the worker hasn't been > health checked within the desired interval and queues up a check, it > doesn't know one is queued. > > > But that is only an issue, afaict, if the time taken to do the health check is > greater than the interval chosen... Or am I misunderstanding? That is, > if the interval is 200ms, and the health check takes 100ms, all is fine, we > get 5 checks a second. I don't think it's hc execution time relatve to interval, it's hc execution time relative to AP_WD_TM_SLICE. If it's much more than AP_WD_TM_SLICE they'll stack up while one is running. For a 1 second response you could queue up 10 in a second even if you only have an interval of as minute or hour.
Re: [Bug 62318] healthcheck
> On Aug 24, 2018, at 12:53 PM, Christophe JAILLET > wrote: > > I've only found it in mod_proxy_balancer and, IIUC, the meaning is "slightly" > different from its use in hcheck! :) > Looks like this 'updated' field was dedicated for recording the time a worker > has been added. > > So, my understanding is that, either: >- hcheck already changed the meaning of this field, and broke the API when > it has been introduced. > or > - the API only says that 'updated' is "timestamp of last update", without > telling which kind of update! So why couldn't be used by hcheck to keep > record of the "timestamp of last update"... of its check? It's the latter... recall that health check workers are totally different and separate from real workers. > > I still think that moving when s->updated is updated (sic!) in hcheck should > be OK, and wouldn't be an API breakage for me. > I don't thing that it can interfere in any way with mod_proxy_balancer, at > least with the actual code. > And we should clarify what is the use of thee fields to avoid someelse to > 'steal' them. Let's stop w/ the idea that the API is broken or stolen :) But yeah, doing the update = now when started/queued makes sense, assuming that we understand the issue.
Re: [Bug 62318] healthcheck
> On Aug 24, 2018, at 12:05 PM, Eric Covener wrote: > > On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET > mailto:christophe.jail...@wanadoo.fr>> wrote: >> >> Le 24/08/2018 à 16:40, Jim Jagielski a écrit : >>> I was wondering if someone wanted to provide a sanity check >>> on the above PR and what's "expected" by the health check code. >>> >>> It would be very easy to adjust so that hcinterval was not >>> the time between successive checks but the interval between >>> the end of one and the start of another, but I'm not sure that >>> is as useful. In other words, I think the current behavior >>> is right (but think the docs need to be updated), but am >>> willing to have my mind changed :) >>> >> Hi Jim, >> >> the current behavior is also what I would expect. >> If I configure a check every 10s, I would expect 6 checks each minute, >> even if the test itself takes time to perform. > > > Bug describes something else IIUC. Because the watchdog calls us 10 > times per second, it continuously sees that the worker hasn't been > health checked within the desired interval and queues up a check, it > doesn't know one is queued. But that is only an issue, afaict, if the time taken to do the health check is greater than the interval chosen... Or am I misunderstanding? That is, if the interval is 200ms, and the health check takes 100ms, all is fine, we get 5 checks a second. I guess what we could do is emit a warning if when a check is queued, we already have one queued, or in process. This would some info to the sysadmin. We could also track the time taken to perform a check and have that available via mod_status as well. But these all assume that the underlying logic, and how it's implemented, is sane.
Re: [Bug 62318] healthcheck
Yes, but the updated field is used differently for the health check workers and the "real" workers. > On Aug 24, 2018, at 12:21 PM, Eric Covener wrote: > > On Fri, Aug 24, 2018 at 12:13 PM Christophe JAILLET > mailto:christophe.jail...@wanadoo.fr>> wrote: >> >> Le 24/08/2018 à 17:56, Christophe JAILLET a écrit : >>> Le 24/08/2018 à 16:40, Jim Jagielski a écrit : I was wondering if someone wanted to provide a sanity check on the above PR and what's "expected" by the health check code. It would be very easy to adjust so that hcinterval was not the time between successive checks but the interval between the end of one and the start of another, but I'm not sure that is as useful. In other words, I think the current behavior is right (but think the docs need to be updated), but am willing to have my mind changed :) >>> Hi Jim, >>> >>> the current behavior is also what I would expect. >>> If I configure a check every 10s, I would expect 6 checks each minute, >>> even if the test itself takes time to perform. >>> >>> >>> >>> Not related, but is there any use for 'hc_pre_config()'? >>> We already have: >>> static int tpsize = HC_THREADPOOL_SIZE; >>> >>> Having both looks redundant. >>> >>> CJ >>> >>> >> but shouldn't we >>worker->s->update = now; >> when the check is started (in hc_watchdog_callback()) instead of when it >> is funished (at the end of hc_check())? > > Looks like s->updated is not used elsewhere in HC but is used > elsewhere in proxy modules and is in the API. > I don't know if that calls for a 2nd timestamp or a just a bit for > when checks are in progress. Could be useful in > the future to keep track of the addl information.
Re: [Bug 62318] healthcheck
Le 24/08/2018 à 18:21, Eric Covener a écrit : On Fri, Aug 24, 2018 at 12:13 PM Christophe JAILLET wrote: Le 24/08/2018 à 17:56, Christophe JAILLET a écrit : Le 24/08/2018 à 16:40, Jim Jagielski a écrit : I was wondering if someone wanted to provide a sanity check on the above PR and what's "expected" by the health check code. It would be very easy to adjust so that hcinterval was not the time between successive checks but the interval between the end of one and the start of another, but I'm not sure that is as useful. In other words, I think the current behavior is right (but think the docs need to be updated), but am willing to have my mind changed :) Hi Jim, the current behavior is also what I would expect. If I configure a check every 10s, I would expect 6 checks each minute, even if the test itself takes time to perform. Not related, but is there any use for 'hc_pre_config()'? We already have: static int tpsize = HC_THREADPOOL_SIZE; Having both looks redundant. CJ but shouldn't we worker->s->update = now; when the check is started (in hc_watchdog_callback()) instead of when it is funished (at the end of hc_check())? Looks like s->updated is not used elsewhere in HC but is used elsewhere in proxy modules and is in the API. I don't know if that calls for a 2nd timestamp or a just a bit for when checks are in progress. Could be useful in the future to keep track of the addl information. I've only found it in mod_proxy_balancer and, IIUC, the meaning is "slightly" different from its use in hcheck! :) Looks like this 'updated' field was dedicated for recording the time a worker has been added. So, my understanding is that, either: - hcheck already changed the meaning of this field, and broke the API when it has been introduced. or - the API only says that 'updated' is "timestamp of last update", without telling which kind of update! So why couldn't be used by hcheck to keep record of the "timestamp of last update"... of its check? I still think that moving when s->updated is updated (sic!) in hcheck should be OK, and wouldn't be an API breakage for me. I don't thing that it can interfere in any way with mod_proxy_balancer, at least with the actual code. And we should clarify what is the use of thee fields to avoid someelse to 'steal' them. just my 2c. CJ
Re: [Bug 62318] healthcheck
On Fri, Aug 24, 2018 at 12:13 PM Christophe JAILLET wrote: > > Le 24/08/2018 à 17:56, Christophe JAILLET a écrit : > > Le 24/08/2018 à 16:40, Jim Jagielski a écrit : > >> I was wondering if someone wanted to provide a sanity check > >> on the above PR and what's "expected" by the health check code. > >> > >> It would be very easy to adjust so that hcinterval was not > >> the time between successive checks but the interval between > >> the end of one and the start of another, but I'm not sure that > >> is as useful. In other words, I think the current behavior > >> is right (but think the docs need to be updated), but am > >> willing to have my mind changed :) > >> > > Hi Jim, > > > > the current behavior is also what I would expect. > > If I configure a check every 10s, I would expect 6 checks each minute, > > even if the test itself takes time to perform. > > > > > > > > Not related, but is there any use for 'hc_pre_config()'? > > We already have: > >static int tpsize = HC_THREADPOOL_SIZE; > > > > Having both looks redundant. > > > > CJ > > > > > but shouldn't we > worker->s->update = now; > when the check is started (in hc_watchdog_callback()) instead of when it > is funished (at the end of hc_check())? Looks like s->updated is not used elsewhere in HC but is used elsewhere in proxy modules and is in the API. I don't know if that calls for a 2nd timestamp or a just a bit for when checks are in progress. Could be useful in the future to keep track of the addl information.
Re: [Bug 62318] healthcheck
Yes, agreed. CJ Le 24/08/2018 à 18:05, Eric Covener a écrit : On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET wrote: Le 24/08/2018 à 16:40, Jim Jagielski a écrit : I was wondering if someone wanted to provide a sanity check on the above PR and what's "expected" by the health check code. It would be very easy to adjust so that hcinterval was not the time between successive checks but the interval between the end of one and the start of another, but I'm not sure that is as useful. In other words, I think the current behavior is right (but think the docs need to be updated), but am willing to have my mind changed :) Hi Jim, the current behavior is also what I would expect. If I configure a check every 10s, I would expect 6 checks each minute, even if the test itself takes time to perform. Bug describes something else IIUC. Because the watchdog calls us 10 times per second, it continuously sees that the worker hasn't been health checked within the desired interval and queues up a check, it doesn't know one is queued.
Re: [Bug 62318] healthcheck
Le 24/08/2018 à 17:56, Christophe JAILLET a écrit : Le 24/08/2018 à 16:40, Jim Jagielski a écrit : I was wondering if someone wanted to provide a sanity check on the above PR and what's "expected" by the health check code. It would be very easy to adjust so that hcinterval was not the time between successive checks but the interval between the end of one and the start of another, but I'm not sure that is as useful. In other words, I think the current behavior is right (but think the docs need to be updated), but am willing to have my mind changed :) Hi Jim, the current behavior is also what I would expect. If I configure a check every 10s, I would expect 6 checks each minute, even if the test itself takes time to perform. Not related, but is there any use for 'hc_pre_config()'? We already have: static int tpsize = HC_THREADPOOL_SIZE; Having both looks redundant. CJ but shouldn't we worker->s->update = now; when the check is started (in hc_watchdog_callback()) instead of when it is funished (at the end of hc_check())? Otherwise, it could be re-triggered before the completion of the first one (if slow) CJ
Re: [Bug 62318] healthcheck
On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET wrote: > > Le 24/08/2018 à 16:40, Jim Jagielski a écrit : > > I was wondering if someone wanted to provide a sanity check > > on the above PR and what's "expected" by the health check code. > > > > It would be very easy to adjust so that hcinterval was not > > the time between successive checks but the interval between > > the end of one and the start of another, but I'm not sure that > > is as useful. In other words, I think the current behavior > > is right (but think the docs need to be updated), but am > > willing to have my mind changed :) > > > Hi Jim, > > the current behavior is also what I would expect. > If I configure a check every 10s, I would expect 6 checks each minute, > even if the test itself takes time to perform. Bug describes something else IIUC. Because the watchdog calls us 10 times per second, it continuously sees that the worker hasn't been health checked within the desired interval and queues up a check, it doesn't know one is queued.
Re: [Bug 62318] healthcheck
Le 24/08/2018 à 16:40, Jim Jagielski a écrit : I was wondering if someone wanted to provide a sanity check on the above PR and what's "expected" by the health check code. It would be very easy to adjust so that hcinterval was not the time between successive checks but the interval between the end of one and the start of another, but I'm not sure that is as useful. In other words, I think the current behavior is right (but think the docs need to be updated), but am willing to have my mind changed :) Hi Jim, the current behavior is also what I would expect. If I configure a check every 10s, I would expect 6 checks each minute, even if the test itself takes time to perform. Not related, but is there any use for 'hc_pre_config()'? We already have: static int tpsize = HC_THREADPOOL_SIZE; Having both looks redundant. CJ
Re: [Bug 62318] healthcheck
I was wondering if someone wanted to provide a sanity check on the above PR and what's "expected" by the health check code. It would be very easy to adjust so that hcinterval was not the time between successive checks but the interval between the end of one and the start of another, but I'm not sure that is as useful. In other words, I think the current behavior is right (but think the docs need to be updated), but am willing to have my mind changed :)