> On 20 Jun 2017, at 18:46, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> 
> On 06/20/2017 08:12 AM, Steve Clark wrote:
>> On 06/20/2017 10:38 AM, Adrian Klaver wrote:
>>> On 06/20/2017 07:00 AM, Steve Clark wrote:
> 
>> We already have a monitoring system in place that has been in operation 
>> circa 2003. Just recently we have
>> added a new class of customer whose operation is not 24/7.
>> I envision the schedule could be fairly complicated
>> including WE and holidays, plus the enduser might shut down for lunch etc. I 
>> am looking for more on how to organize the
>> schedule, EG a standard weekly schedule then exceptions for holidays etc, or 
>> a separate individual schedule for
>> each week, also need to consider how easy it is to maintain the schedule, 
>> etc.
> 
> Yes this could be become complicated if for no other reason then it is being 
> driven from the customer end and there will need to be a process to verify 
> and incorporate their changes.

There you're saying something rather important: "If it is being driven from the 
customer end".

> 2) Figure out what a day is. In other words are different timezones involved 
> and if so what do you 'anchor' a day to?

For an example of how that might fail: At our company, they work in shifts (I 
don't) of 3*8 hours, that run from 23:00 to 23:00. Depending on who looks at 
the data, either that's a day or a normal day (00:00-00:00) is. It's a matter 
of perspective.


IMHO, the only safe approach is to have the customer end decide whether it's a 
regular outage or an irregular one. There is just no way to reliably guess that 
from the data. If a customer decides to turn off the system when he's going 
home, you can't guess when he's going to do that and you will be raising false 
positives when you depend on a schedule of when he might be going home.

>From a software implementation point of view that means that your 
>customer-side application needs to be able to signal planned shutdowns and 
>startups. If you detect any outages without such a signal, then you can flag 
>it as a problem.

There are still opportunities for getting those wrong of course, such as lack 
of connectivity between you and your customer, but those should be easy to 
explain once detected.
And I'm sure there are plenty of other corner-cases you need to take into 
account. I bet it has a lot of problems in common with replication actually 
(how do we reliably get information from system A to system B), so it probably 
pays to look at what particular problems occur there and how they're solved.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to