subject:"\[Bug 62318\] healthcheck"

Re: [Bug 62318] healthcheck

2018-08-24 Thread Jim Jagielski



> On Aug 24, 2018, at 1:18 PM, Eric Covener  wrote:
> 
> 
> I don't think it's hc execution time relatve to interval, it's hc
> execution time relative to AP_WD_TM_SLICE.  If it's much more than
> AP_WD_TM_SLICE they'll stack up while one is running.
> For a 1 second response you could queue up 10 in a second even if you
> only have an interval of as minute or hour.


Hmm... let me do some testing here... if that's the case, then it's borked
for sure.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Christophe JAILLET


Le 24/08/2018 à 19:08, Jim Jagielski a écrit :



On Aug 24, 2018, at 12:53 PM, Christophe JAILLET 
 wrote:

I've only found it in mod_proxy_balancer and, IIUC, the meaning is "slightly" 
different from its use in hcheck! :)
Looks like this 'updated' field was dedicated for recording the time a worker 
has been added.

So, my understanding is that, either:
- hcheck already changed the meaning of this field, and broke the API when 
it has been introduced.
or
   - the API only says that 'updated' is "timestamp of last update", without telling 
which kind of update! So why couldn't be used by hcheck to keep record of the "timestamp of 
last update"... of its check?

It's the latter... recall that health check workers are totally different and 
separate from real workers.

I wasn't aware of that. So +1 for me.
Maybe some code comment (if not already present), should clarify that?





I still think that moving when s->updated is updated (sic!) in hcheck should be 
OK, and wouldn't be an API breakage for me.
I don't thing that it can interfere in any way with mod_proxy_balancer, at 
least with the actual code.
And we should clarify what is the use of thee fields to avoid someelse to 
'steal' them.

Let's stop w/ the idea that the API is broken or stolen :)

But yeah, doing the update = now when started/queued makes sense,
assuming that we understand the issue.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Eric Covener

On Fri, Aug 24, 2018 at 1:05 PM Jim Jagielski  wrote:
>
>
>
> On Aug 24, 2018, at 12:05 PM, Eric Covener  wrote:
>
> On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET
>  wrote:
>
>
> Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
>
> I was wondering if someone wanted to provide a sanity check
> on the above PR and what's "expected" by the health check code.
>
> It would be very easy to adjust so that hcinterval was not
> the time between successive checks but the interval between
> the end of one and the start of another, but I'm not sure that
> is as useful. In other words, I think the current behavior
> is right (but think the docs need to be updated), but am
> willing to have my mind changed :)
>
> Hi Jim,
>
> the current behavior is also what I would expect.
> If I configure a check every 10s, I would expect 6 checks each minute,
> even if the test itself takes time to perform.
>
>
>
> Bug describes something else IIUC.  Because the watchdog calls us 10
> times per second, it continuously sees that the worker hasn't been
> health checked within the desired interval and queues up a check, it
> doesn't know one is queued.
>
>
> But that is only an issue, afaict, if the time taken to do the health check is
> greater than the interval chosen... Or am I misunderstanding? That is,
> if the interval is 200ms, and the health check takes 100ms, all is fine, we
> get 5 checks a second.

I don't think it's hc execution time relatve to interval, it's hc
execution time relative to AP_WD_TM_SLICE.  If it's much more than
AP_WD_TM_SLICE they'll stack up while one is running.
For a 1 second response you could queue up 10 in a second even if you
only have an interval of as minute or hour.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Jim Jagielski




> On Aug 24, 2018, at 12:53 PM, Christophe JAILLET 
>  wrote:
> 
> I've only found it in mod_proxy_balancer and, IIUC, the meaning is "slightly" 
> different from its use in hcheck! :)
> Looks like this 'updated' field was dedicated for recording the time a worker 
> has been added.
> 
> So, my understanding is that, either:
>- hcheck already changed the meaning of this field, and broke the API when 
> it has been introduced.
> or
>   - the API only says that 'updated' is "timestamp of last update", without 
> telling which kind of update! So why couldn't be used by hcheck to keep 
> record of the "timestamp of last update"... of its check?

It's the latter... recall that health check workers are totally different and 
separate from real workers.

> 
> I still think that moving when s->updated is updated (sic!) in hcheck should 
> be OK, and wouldn't be an API breakage for me.
> I don't thing that it can interfere in any way with mod_proxy_balancer, at 
> least with the actual code.
> And we should clarify what is the use of thee fields to avoid someelse to 
> 'steal' them.

Let's stop w/ the idea that the API is broken or stolen :)

But yeah, doing the update = now when started/queued makes sense,
assuming that we understand the issue.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Jim Jagielski

> On Aug 24, 2018, at 12:05 PM, Eric Covener  wrote:
> 
> On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET
> mailto:christophe.jail...@wanadoo.fr>> wrote:
>> 
>> Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
>>> I was wondering if someone wanted to provide a sanity check
>>> on the above PR and what's "expected" by the health check code.
>>> 
>>> It would be very easy to adjust so that hcinterval was not
>>> the time between successive checks but the interval between
>>> the end of one and the start of another, but I'm not sure that
>>> is as useful. In other words, I think the current behavior
>>> is right (but think the docs need to be updated), but am
>>> willing to have my mind changed :)
>>> 
>> Hi Jim,
>> 
>> the current behavior is also what I would expect.
>> If I configure a check every 10s, I would expect 6 checks each minute,
>> even if the test itself takes time to perform.
> 
> 
> Bug describes something else IIUC.  Because the watchdog calls us 10
> times per second, it continuously sees that the worker hasn't been
> health checked within the desired interval and queues up a check, it
> doesn't know one is queued.

But that is only an issue, afaict, if the time taken to do the health check is
greater than the interval chosen... Or am I misunderstanding? That is,
if the interval is 200ms, and the health check takes 100ms, all is fine, we
get 5 checks a second. 

I guess what we could do is emit a warning if when a check is queued, we
already have one queued, or in process. This would some info to the sysadmin.
We could also track the time taken to perform a check and have that available
via mod_status as well. But these all assume that the underlying logic, and
how it's implemented, is sane.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Jim Jagielski

Yes, but the updated field is used differently for the health check workers and
the "real" workers.

> On Aug 24, 2018, at 12:21 PM, Eric Covener  wrote:
> 
> On Fri, Aug 24, 2018 at 12:13 PM Christophe JAILLET
> mailto:christophe.jail...@wanadoo.fr>> wrote:
>> 
>> Le 24/08/2018 à 17:56, Christophe JAILLET a écrit :
>>> Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
 I was wondering if someone wanted to provide a sanity check
 on the above PR and what's "expected" by the health check code.
 
 It would be very easy to adjust so that hcinterval was not
 the time between successive checks but the interval between
 the end of one and the start of another, but I'm not sure that
 is as useful. In other words, I think the current behavior
 is right (but think the docs need to be updated), but am
 willing to have my mind changed :)
 
>>> Hi Jim,
>>> 
>>> the current behavior is also what I would expect.
>>> If I configure a check every 10s, I would expect 6 checks each minute,
>>> even if the test itself takes time to perform.
>>> 
>>> 
>>> 
>>> Not related, but is there any use for 'hc_pre_config()'?
>>> We already have:
>>>   static int tpsize = HC_THREADPOOL_SIZE;
>>> 
>>> Having both looks redundant.
>>> 
>>> CJ
>>> 
>>> 
>> but shouldn't we
>>worker->s->update = now;
>> when the check is started (in hc_watchdog_callback()) instead of when it
>> is funished (at the end of hc_check())?
> 
> Looks like s->updated is not used elsewhere in HC but is used
> elsewhere in proxy modules and is in the API.
> I don't know if that calls for a 2nd timestamp or a just a bit for
> when checks are in progress.  Could be useful in
> the future to keep track of the addl information.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Christophe JAILLET


Le 24/08/2018 à 18:21, Eric Covener a écrit :

On Fri, Aug 24, 2018 at 12:13 PM Christophe JAILLET
 wrote:

Le 24/08/2018 à 17:56, Christophe JAILLET a écrit :

Le 24/08/2018 à 16:40, Jim Jagielski a écrit :

I was wondering if someone wanted to provide a sanity check
on the above PR and what's "expected" by the health check code.

It would be very easy to adjust so that hcinterval was not
the time between successive checks but the interval between
the end of one and the start of another, but I'm not sure that
is as useful. In other words, I think the current behavior
is right (but think the docs need to be updated), but am
willing to have my mind changed :)


Hi Jim,

the current behavior is also what I would expect.
If I configure a check every 10s, I would expect 6 checks each minute,
even if the test itself takes time to perform.



Not related, but is there any use for 'hc_pre_config()'?
We already have:
static int tpsize = HC_THREADPOOL_SIZE;

Having both looks redundant.

CJ



but shouldn't we
 worker->s->update = now;
when the check is started (in hc_watchdog_callback()) instead of when it
is funished (at the end of hc_check())?

Looks like s->updated is not used elsewhere in HC but is used
elsewhere in proxy modules and is in the API.
I don't know if that calls for a 2nd timestamp or a just a bit for
when checks are in progress.  Could be useful in
the future to keep track of the addl information.
I've only found it in mod_proxy_balancer and, IIUC, the meaning is 
"slightly" different from its use in hcheck! :)
Looks like this 'updated' field was dedicated for recording the time a 
worker has been added.


So, my understanding is that, either:
   - hcheck already changed the meaning of this field, and broke the 
API when it has been introduced.

or
  - the API only says that 'updated' is "timestamp of last update", 
without telling which kind of update! So why couldn't be used by hcheck 
to keep record of the "timestamp of last update"... of its check?


I still think that moving when s->updated is updated (sic!) in hcheck 
should be OK, and wouldn't be an API breakage for me.
I don't thing that it can interfere in any way with mod_proxy_balancer, 
at least with the actual code.
And we should clarify what is the use of thee fields to avoid someelse 
to 'steal' them.


just my 2c.

CJ

Re: [Bug 62318] healthcheck

2018-08-24 Thread Eric Covener

On Fri, Aug 24, 2018 at 12:13 PM Christophe JAILLET
 wrote:
>
> Le 24/08/2018 à 17:56, Christophe JAILLET a écrit :
> > Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
> >> I was wondering if someone wanted to provide a sanity check
> >> on the above PR and what's "expected" by the health check code.
> >>
> >> It would be very easy to adjust so that hcinterval was not
> >> the time between successive checks but the interval between
> >> the end of one and the start of another, but I'm not sure that
> >> is as useful. In other words, I think the current behavior
> >> is right (but think the docs need to be updated), but am
> >> willing to have my mind changed :)
> >>
> > Hi Jim,
> >
> > the current behavior is also what I would expect.
> > If I configure a check every 10s, I would expect 6 checks each minute,
> > even if the test itself takes time to perform.
> >
> >
> >
> > Not related, but is there any use for 'hc_pre_config()'?
> > We already have:
> >static int tpsize = HC_THREADPOOL_SIZE;
> >
> > Having both looks redundant.
> >
> > CJ
> >
> >
> but shouldn't we
> worker->s->update = now;
> when the check is started (in hc_watchdog_callback()) instead of when it
> is funished (at the end of hc_check())?

Looks like s->updated is not used elsewhere in HC but is used
elsewhere in proxy modules and is in the API.
I don't know if that calls for a 2nd timestamp or a just a bit for
when checks are in progress.  Could be useful in
the future to keep track of the addl information.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Marion & Christophe JAILLET


Yes, agreed.

CJ


Le 24/08/2018 à 18:05, Eric Covener a écrit :

On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET
 wrote:

Le 24/08/2018 à 16:40, Jim Jagielski a écrit :

I was wondering if someone wanted to provide a sanity check
on the above PR and what's "expected" by the health check code.

It would be very easy to adjust so that hcinterval was not
the time between successive checks but the interval between
the end of one and the start of another, but I'm not sure that
is as useful. In other words, I think the current behavior
is right (but think the docs need to be updated), but am
willing to have my mind changed :)


Hi Jim,

the current behavior is also what I would expect.
If I configure a check every 10s, I would expect 6 checks each minute,
even if the test itself takes time to perform.


Bug describes something else IIUC.  Because the watchdog calls us 10
times per second, it continuously sees that the worker hasn't been
health checked within the desired interval and queues up a check, it
doesn't know one is queued.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Christophe JAILLET


Le 24/08/2018 à 17:56, Christophe JAILLET a écrit :

Le 24/08/2018 à 16:40, Jim Jagielski a écrit :

I was wondering if someone wanted to provide a sanity check
on the above PR and what's "expected" by the health check code.

It would be very easy to adjust so that hcinterval was not
the time between successive checks but the interval between
the end of one and the start of another, but I'm not sure that
is as useful. In other words, I think the current behavior
is right (but think the docs need to be updated), but am
willing to have my mind changed :)


Hi Jim,

the current behavior is also what I would expect.
If I configure a check every 10s, I would expect 6 checks each minute, 
even if the test itself takes time to perform.




Not related, but is there any use for 'hc_pre_config()'?
We already have:
   static int tpsize = HC_THREADPOOL_SIZE;

Having both looks redundant.

CJ



but shouldn't we
   worker->s->update = now;
when the check is started (in hc_watchdog_callback()) instead of when it 
is funished (at the end of hc_check())?


Otherwise, it could be re-triggered before the completion of the first 
one (if slow)


CJ

Re: [Bug 62318] healthcheck

2018-08-24 Thread Eric Covener

On Fri, Aug 24, 2018 at 11:57 AM Christophe JAILLET
 wrote:
>
> Le 24/08/2018 à 16:40, Jim Jagielski a écrit :
> > I was wondering if someone wanted to provide a sanity check
> > on the above PR and what's "expected" by the health check code.
> >
> > It would be very easy to adjust so that hcinterval was not
> > the time between successive checks but the interval between
> > the end of one and the start of another, but I'm not sure that
> > is as useful. In other words, I think the current behavior
> > is right (but think the docs need to be updated), but am
> > willing to have my mind changed :)
> >
> Hi Jim,
>
> the current behavior is also what I would expect.
> If I configure a check every 10s, I would expect 6 checks each minute,
> even if the test itself takes time to perform.


Bug describes something else IIUC.  Because the watchdog calls us 10
times per second, it continuously sees that the worker hasn't been
health checked within the desired interval and queues up a check, it
doesn't know one is queued.

Re: [Bug 62318] healthcheck

2018-08-24 Thread Christophe JAILLET


Le 24/08/2018 à 16:40, Jim Jagielski a écrit :

I was wondering if someone wanted to provide a sanity check
on the above PR and what's "expected" by the health check code.

It would be very easy to adjust so that hcinterval was not
the time between successive checks but the interval between
the end of one and the start of another, but I'm not sure that
is as useful. In other words, I think the current behavior
is right (but think the docs need to be updated), but am
willing to have my mind changed :)


Hi Jim,

the current behavior is also what I would expect.
If I configure a check every 10s, I would expect 6 checks each minute, 
even if the test itself takes time to perform.




Not related, but is there any use for 'hc_pre_config()'?
We already have:
   static int tpsize = HC_THREADPOOL_SIZE;

Having both looks redundant.

CJ

Re: [Bug 62318] healthcheck

2018-08-24 Thread Jim Jagielski

I was wondering if someone wanted to provide a sanity check
on the above PR and what's "expected" by the health check code.

It would be very easy to adjust so that hcinterval was not
the time between successive checks but the interval between
the end of one and the start of another, but I'm not sure that
is as useful. In other words, I think the current behavior
is right (but think the docs need to be updated), but am
willing to have my mind changed :)

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

Re: [Bug 62318] healthcheck

13 matches

Site Navigation

Mail list logo

Footer information