RE: SLA Security

Martin, James E. Thu, 24 Jan 2002 09:10:27 -0800
What I'm referring to is a SLA to set expectations for event response from downstream 
networks. Bear in mind I'm exploring this from the perspective of a backbone provider 
to an educational network. What are other providers doing? 
 
Our base current response metric for customers is 30 minutes for critical events, on a 
24x7 basis. We've consistantly met that for four years with a very small staff (remind 
me to address burnout issues another time). 
 
What I'm after is management of customer expectations to that response, as well as 
management of our own expecations to their responses. So, as a hypothetical SLA 
clause, if the customer has established a CSIRT, supports RFC abuse addresses, has an 
event response policy and capability as audited by us, has the staff and tools 
dedicated to providing their own event response capability, exchanged PGP keys with 
us, and maintains a high response level (reasonably similar to our own), we agree to 
trust them more from a backbone provider's point of view. While our AUP empowers us to 
protect the network, it's occasionally awkward to do that from a border router 
(especially involving NAT'ed networks, firewalls, proxies and so on). We could provide 
more time without taking defensive measures ourselves, with the expectation the 
downstream network has the capability to investigate and resolved problems. 
 
At the other end of the scale, there are downstream networks without trained technical 
staff, let alone even a part time event response capability. They lock up the doors at 
4:30 PM and go home, and stay there all weekend (and all through summer vacations, 
etc.). Often, at these networks, decision making is done by completely non-technical 
administrators who have no ability to implement requested defensive measures, let 
alone access or read logs. Some of these do not have their own admin passwords, having 
completely "contracted out" system administration to community volunteers. Some of 
these networks practice security by obscurity, unintentionally hiding accountability 
even from themselves. Some of these networks think their Wingates are firewalls. 
 
We're working on these from an educational point of view, but the policies to protect 
the network have been in place for four years. In the last quarter, our security team 
took more than 1700 events. This is 101% of the entire preceeding year. We have low 
expectations of additional staff or resources. Our parent organization currently uses 
SLAs to define base levels of expectations, with higher response and improved service 
after hours for those who choose to participate. These other SLAs require the 
downstream site to invest in hardware, software and training. If a site does not have 
the capability to react to a request to investigate and protect the network, then that 
could be taken into account in requiring a *shorter* time for response from that 
network. 
 
Our current practice requires us to make a best effort to contact the customer before 
defending the network. This has ranged from calling the campus police to locate a 
faculty member during a holiday break, to sending voicemail, e-mail and faxes to each 
of four responsible persons in a building we had good reason to belive would be locked 
and empty for the next week before defending the network. 

        -----Original Message----- 
        From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 
        Sent: Tue 1/22/2002 9:30 AM 
        To: Martin, James E. 
        Cc: [EMAIL PROTECTED] 
        Subject: RE: SLA Security
        
        

        Not to bandy words, but it sounds like you're more interested in a policy and 
procedures document rather than SLAs. The policy and procedures document would say 
here's "what" you have to do. The SLA says here's how we measure what you do.

        For example, in the case of a report of an outside attack, your contract with 
an ISP might say that the ISP will work with your technical people to promptly 
investigate the claim and will develop an appropriate response. The policy and 
procedures document could discuss types of attacks (DOS vs. crack, etc.) and the steps 
you and the ISP will take in the event of such attacks. 

        The SLA might say that if you report an outside attack, the ISP will respond 
to you regarding the report within 1 hour 98% of the time and within 4 hours 99.5% of 
the time. You might even split the measurements depending on the type of attack.  If 
you are experiencing a DOS attack, the response time might be different than if your 
are being cracked.

        Regarding downstream maintenance, patches, etc., a policy and procedures 
document would identify who is responsible for which activities, and specific 
requirements for keeping versions up to date (e.g., for security software you must be 
on the most current version (N), for other operational software you must at least be 
on N-2, for some non-critical software you must be on N-4, etc.), the process for 
implementing patches (e.g. install in DEV or TEST and run specified acceptance testing 
before moving to PROD).  An SLA, on the other hand, would specify, for example, how 
quickly the vendor must install the latest version of something once it is released.

        Hope this is helpful. 

        John 

        In a message dated Tue, 22 Jan 2002  9:04:58 AM Eastern Standard Time, 
"Martin, James E." <[EMAIL PROTECTED]> writes: 

        > I'd be interested in any SLA work done on security event response by an ISP 
covering the following areas: 
        >  
        > a. Defense of the network against reported outside attacks 
        > b. Defense of the network against attacks reported from the site contracting 
for access 
        > c. Downstream network/site obligations for maintenance, patches, upkeep in 
general 
        >  
        > I've tried a preliminary draft, based on both upstream and downstream 
obligations to respond to reported security events. The document sets out 
responsibilities and standard responses based on whether a site has any after-hours 
event response capability, and whether a site with the capability refuses action or 
declines to protect the network. 

        >  
        > What are others doing in this area? 
        >  
        > Thanks! 
        > Jim 
        > 
        >     -----Original Message----- 
        >     From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 
        >     Sent: Fri 1/18/2002 1:41 PM 
        >     To: [EMAIL PROTECTED] 
        >     Cc: [EMAIL PROTECTED] 
        >     Subject: Re: SLA Security 
        >     
        >     
        > 
        >     A general SLA on security is kind of difficult. Generally, you want your 
SLAs to be specifically quantifiable and measurable, but it depends on the services 
that you are talking about.

        > 
        >     For example, if we were talking about anti-virus protection, you might 
have a service level for how fast the vendor implements the latest set of virus 
definitions. 

        > 
        >     For security, you might have an SLA for time to implement a patch after 
the patch is made available by a relevant vendor.

        > 
        >     If your help desk SLA includes response time and problem correction 
time, then a response and resolution of a security breach or a virus could be subject 
to those SLAs.

        > 
        >     For an IDS, you could include a requirement to audit logs every certain 
period. 
        > 
        >     John
RE: SLA Security

Reply via email to