Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Dimitris Zacharopoulos via dev-security-policy



On 29/11/2018 12:14 π.μ., Wayne Thayer via dev-security-policy wrote:

The way that we currently handle these types of issues is about as good as
we're going to get. We have a [recently relaxed but still] fairly stringent
set of rules around revocation in the BRs. This is necessary and proper
because slow/delayed revocation can clearly harm our users. It was
difficult to gain consensus within the CAB Forum on allowing even 5 days in
some circumstances - I'm confident that something like 28 days would be a
non-starter. I'm also confident that CAs will always take the entire time
permitted to perform revocations, regardless of the risk, because it is in
their interest to do so (that is not mean to be a criticism of CAs so much
as a statement that CAs exist to serve their customers, not our users). I'm
also confident that any attempt to define "low risk" misissuance would just
incentivize CAs to stop treating misissuance as a serious offense and we'd
be back to where we were prior to the existence of linters..

CAs obviously do choose to violate the revocation time requirements. I do
not believe this is generally based on a thorough risk analysis, but in
practice it is clear that they do have some discretion. I am not aware of a
case (yet) in which Mozilla has punished a CA solely for violating a
revocation deadline. When that happens, the violation is documented in a
bug and should appear on the CA's next audit report/attestation statement.
>From there, the circumstances (how many certs?, what was the issue?, was it
previously documented?, is this a pattern of behavior?) have to be
considered on a case-by-case basis to decide a course of action. I realize
that this is not a very satisfying answer to the questions that are being
raised, but I do think it's the best answer.

- Wayne


Mandating that CAs disclose revocation situations that exceed the 5-day 
requirement with some risk analysis information, might be a good place 
to start. Of course, this should be independent of a "mis-issuance 
incident report". By collecting this information, Mozilla would be in a 
better position to evaluate the challenges CAs face with revocations 
*initiated by the CA* without adequate warning to the Subscriber. I 
don't consider 5 days (they are not even working days) to be adequate 
warning period to a large organization with slow reflexes and long 
procedures. Once Mozilla collects more information, you might be able to 
see possible patterns in various CAs and decide what is acceptable and 
what is not, and create policy rules accordingly.


For example, if many CAs violate the 5-day rule for revocations related 
to improper subject information encoding, out of range, wrong syntax and 
that sort, Mozilla or the BRs might decide to have a separate category 
with a different time frame and/or different actions.


It is not the first time we talk about this and it might be worth 
exploring further.


As a general comment, IMHO when we talk about RP risk when a CA issues a 
Certificate with -say- longer than 64 characters in an OU field, that 
would only pose risk to Relying Parties *that want to interact with that 
particular Subscriber*, not the entire Internet. These RPs *might* 
encounter compatibility issues depending on their browser and will 
either contact the Subscriber and notify them that their web site 
doesn't work or they will do nothing. It's similar to a situation where 
a site operator forgets to send the intermediate CA Certificate in the 
chain. These particular RPs will fail to get TLS working when they visit 
the Subscriber's web site.



Dimitris.




On Wed, Nov 28, 2018 at 1:10 PM Nick Lamb via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:


On Mon, 26 Nov 2018 18:47:25 -0500
Ryan Sleevi via dev-security-policy
 wrote:

CAs have made the case - it was not accepted.

On a more fundamental and philosophical level, I think this is
well-intentioned but misguided. Let's consider that the issue is one
that the CA had the full power-and-ability to prevent - namely, they
violated the requirements and misissued. A CA is only in this
situation if they are a bad CA - a good CA will never run the risk of
"annoying" the customer.

I would sympathise with this position if we were considering, say, a
problem that had caused a CA to issue certs with the exact same mistake
for 18 months, rather than, as I understand here, a single certificate.

Individual human errors are inevitable at a "good CA". We should not
design systems, including policy making, that assume all errors will be
prevented because that contradicts the assumption that human error is
inevitable. Although it is often used specifically to mean operator
error, human error can be introduced anywhere. A requirements document
which erroneously says a particular Unicode codepoint is permitted in a
field when it should be forbidden is still human error. A department
head who feels tired and signs off on a piece of work that 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Wayne Thayer via dev-security-policy
The way that we currently handle these types of issues is about as good as
we're going to get. We have a [recently relaxed but still] fairly stringent
set of rules around revocation in the BRs. This is necessary and proper
because slow/delayed revocation can clearly harm our users. It was
difficult to gain consensus within the CAB Forum on allowing even 5 days in
some circumstances - I'm confident that something like 28 days would be a
non-starter. I'm also confident that CAs will always take the entire time
permitted to perform revocations, regardless of the risk, because it is in
their interest to do so (that is not mean to be a criticism of CAs so much
as a statement that CAs exist to serve their customers, not our users). I'm
also confident that any attempt to define "low risk" misissuance would just
incentivize CAs to stop treating misissuance as a serious offense and we'd
be back to where we were prior to the existence of linters..

CAs obviously do choose to violate the revocation time requirements. I do
not believe this is generally based on a thorough risk analysis, but in
practice it is clear that they do have some discretion. I am not aware of a
case (yet) in which Mozilla has punished a CA solely for violating a
revocation deadline. When that happens, the violation is documented in a
bug and should appear on the CA's next audit report/attestation statement.
>From there, the circumstances (how many certs?, what was the issue?, was it
previously documented?, is this a pattern of behavior?) have to be
considered on a case-by-case basis to decide a course of action. I realize
that this is not a very satisfying answer to the questions that are being
raised, but I do think it's the best answer.

- Wayne

On Wed, Nov 28, 2018 at 1:10 PM Nick Lamb via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On Mon, 26 Nov 2018 18:47:25 -0500
> Ryan Sleevi via dev-security-policy
>  wrote:
> > CAs have made the case - it was not accepted.
> >
> > On a more fundamental and philosophical level, I think this is
> > well-intentioned but misguided. Let's consider that the issue is one
> > that the CA had the full power-and-ability to prevent - namely, they
> > violated the requirements and misissued. A CA is only in this
> > situation if they are a bad CA - a good CA will never run the risk of
> > "annoying" the customer.
>
> I would sympathise with this position if we were considering, say, a
> problem that had caused a CA to issue certs with the exact same mistake
> for 18 months, rather than, as I understand here, a single certificate.
>
> Individual human errors are inevitable at a "good CA". We should not
> design systems, including policy making, that assume all errors will be
> prevented because that contradicts the assumption that human error is
> inevitable. Although it is often used specifically to mean operator
> error, human error can be introduced anywhere. A requirements document
> which erroneously says a particular Unicode codepoint is permitted in a
> field when it should be forbidden is still human error. A department
> head who feels tired and signs off on a piece of work that actually
> didn't pass tests, still human error.
>
> In true failure-is-death scenarios like fly-by-wire aircraft controls
> this assumption means extraordinary methods must be used in order to
> minimise the risk of inevitable human error resulting in real world
> systems failure. Accordingly the resulting systems are exceptionally
> expensive. Though the Web PKI is important, we should not imagine for
> ourselves that it warrants this degree of care and justifies this level
> of expense even at a "good CA".
>
> What we can require in policy - and as I understand it Mozilla policy
> does require - is that the management (also humans) take steps to
> report known problems and prevent them from recurring. This happened
> here.
>
> > This presumes that the customer cannot take steps to avoid this.
> > However, as suggested by others, the customer could have minimized or
> > eliminated annoyance, such as by ensuring they have a robust system
> > to automate the issuance/replacement of certificates. That they
> > didn't is an operational failure on their fault.
>
> I agree with this part.
>
> > This presumes that there is "little or no risk to relying parties."
> > Unfortunately, they are by design not a stakeholder in those
> > conversations
>
> It does presume this, and I've seen no evidence to the contrary. Also I
> think I am in fact a stakeholder in this conversation anyway?
>
> > I agree that it's entirely worthless the increasingly implausible
> > "important" revocations. I think a real and meaningful solution is
> > what is being more consistently pursued, and that's to distrust CAs
> > that are not adhering to the set of expectations.
>
> I don't think root distrust is an appropriate response, in the current
> state, to a single incident of this nature, this sort of thing is,
> indeed, why you may 

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Jakob Bohm via dev-security-policy
On 27/11/2018 00:54, Ryan Sleevi wrote:
> On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
> 
>> 1. Having a spare certificate ready (if done with proper security, e.g.
>> a separate key) from a different CA may unfortunately conflict with
>> badly thought out parts of various certificate "pinning" standards.
>>
> 
> You blame the standards, but that seems an operational risk that the site
> (knowingly) took. That doesn't make a compelling argument.
> 

I blame those standards for forcing every site to choose between two 
unfortunate risks, in this case either the risks prevented by those 
"pinning" mechanisms and the risks associated with having only one 
certificate.

The fact that sites are forced to make that choice makes it unfair to 
presume they should always choose to prevent whichever risk is discussed 
in a given context.  Groups discussing other risks could equally unfairly 
blame sites for not using one of those "pinning" mechanims.

> 
>> 2. Being critical from a society perspective (e.g. being the contact
>> point for a service to help protect the planet), doesn't mean that the
>> people running such a service can be expected to be IT superstars
>> capable of dealing with complex IT issues such as unscheduled
>> certificate replacement due to no fault of their own.
>>
> 
> That sounds like an operational risk the site (knowingly) took. Solutions
> for automation exist, as do concepts such as "hiring multiple people"
> (having a NOC/SOC). I see nothing to argue that a single person is somehow
> the risk here.
> 

The number of people in the world who can do this is substantially 
smaller than the number of sites that might need them.  We must 
therefore, by necessity, accept that some such sites will not hire such 
people, or worse multiple such people for their own exclusive use.

Automating certificate deployment (as you often suggest) lowers 
operational security, as it necessarily grants read/write access to 
the certificate data (including private key) to an automated, online, 
unsupervised system.

Allowing multiple persons to replace the certificates also lowers 
operational security, as it (by definition) grants multiple persons 
read/write access to the certificate data.

Under the current and past CA model, certificate and private key 
replacement is a rare (once/2 years) operation that can be done 
manually and scheduled weeks in advance, except for unexpected 
failures (such as a CA messing up).


> 
>> 3. Not every site can be expected to have the 24/7 staff on hand to do
>> "top security credentials required" changes, for example a high-
>> security end site may have a rule that two senior officials need to
>> sign off on any change in cryptographic keys and certificates, while a
>> limited-staff end-site may have to schedule a visit from their outside
>> security consultant to perform the certificate replacement.
>>
> 
> This is exactly describing a known risk that the site took, accepting the
> tradeoffs. I fail to see a compelling argument that there should be no
> tradeoffs - given the harm presented to the ecosystem - and if sites want
> to make such policies, rather than promoting automation and CI/CD, then it
> seems that's a risk they should bear and make an informed choice.
> 

The trade off would have been made against the risk of the site itself 
mishandling its private key (e.g. a site breach).  Not against force 
majeure situations such as a CA recalling a certificate out of turn.

It is generally not fair to say "that we may impose a difficult 
situation is a risk that the site took".

> Thus I would be all for an official BR ballot to clarify/introduce
>> that 24 hour revocation for non-compliance doesn't apply to non-
>> dangerous technical violations.
>>
> 
> As discussed elsewhere, there is no such thing as "non-dangerous technical
> violations". It is a construct, much like "clean coal", that has an
> appealing turn of phrase, but without the evidence to support it.
> 

That is simply not true.  The case at hand is a very good example, as 
the problem is that a text field used only for display purposes by 
current software, and generally requiring either human interpretation or 
yet-to-be-defined parseable definitions, was given an out-of-range 
value.

Unless someone can point out a real-world piece of production software 
which causes security problems when presented with the particular out-
of-range value, or that the particular out-of-range value would 
reasonably mislead human relying parties, than dangers are entirely 
hypothetical and/or political.

> 
>> Another category that would justify a longer CA response time would be a
>> situation where a large batch of certificates need to be revalidated due
>> to a weakness in validation procedures (such as finding out that a
>> validation method had a vulnerability, but not knowing which if any of

Re: Incident report D-TRUST: syntax error in one tls certificate

2018-11-28 Thread Pedro Fuentes via dev-security-policy
Hi Rufus,
I got internal server error on that link, but I really appreciate your post and 
the link to code!
Pedro

El miércoles, 28 de noviembre de 2018, 8:45:42 (UTC+1), Buschart, Rufus  
escribió:
> To simplify the process of monitoring crt.sh, we at Siemens have implemented 
> a little web service which directly queries crt.sh DB and returns the errors 
> as JSON. By this you don't have to parse HTML files and can directly 
> integrate it into your monitoring. Maybe this function is of interest for 
> some other CA:
> 
> https://eo0kjkxapi.execute-api.eu-central-1.amazonaws.com/prod/crtsh-monitor?caID=52410=30=false
> 
> To monitor your CA, replace the caID with your CA's ID from crt.sh. In case 
> you receive an endpoint time-out message, try again, crt.sh DB often returns 
> time outs. For more details or function requests, have a look into its GitHub 
> repo: https://github.com/RufusJWB/crt.sh-monitor
> 
> 
> With best regards,
> Rufus Buschart
> 
> Siemens AG
> Information Technology
> Human Resources
> PKI / Trustcenter
> GS IT HR 7 4
> Hugo-Junkers-Str. 9
> 90411 Nuernberg, Germany 
> Tel.: +49 1522 2894134
> mailto:rufus.busch...@siemens.com
> www.twitter.com/siemens
> 
> www.siemens.com/ingenuityforlife
> 
> Siemens Aktiengesellschaft: Chairman of the Supervisory Board: Jim Hagemann 
> Snabe; Managing Board: Joe Kaeser, Chairman, President and Chief Executive 
> Officer; Roland Busch, Lisa Davis, Klaus Helmrich, Janina Kugel, Cedrik 
> Neike, Michael Sen, Ralf P. Thomas; Registered offices: Berlin and Munich, 
> Germany; Commercial registries: Berlin Charlottenburg, HRB 12300, Munich, HRB 
> 6684; WEEE-Reg.-No. DE 23691322
> 
> > -Ursprüngliche Nachricht-
> > Von: dev-security-policy  Im 
> > Auftrag von Enrico Entschew via dev-security-policy
> > Gesendet: Dienstag, 27. November 2018 18:17
> > An: mozilla-dev-security-pol...@lists.mozilla.org
> > Betreff: Re: Incident report D-TRUST: syntax error in one tls certificate
> > 
> > Am Montag, 26. November 2018 18:34:38 UTC+1 schrieb Jakob Bohm:
> > 
> > > In addition to this, would you add the following:
> > >
> > > - Daily checks of crt.sh (or some other existing tool) if  additional
> > > such certificates are erroneously issued before  the automated
> > > countermeasures are in place?
> > 
> > Thank you, Jakob. This is what we intended to do. We are monitoring crt.sh 
> > at least twice daily every day from now on.
> > 
> > As to your other point, we do restrict the serial number element and the 
> > error occurred precisely in defining the constraints for this
> > field. As mentioned above, we plan to make adjustments to our systems to 
> > prevent this kind of error in future.
> > ___
> > dev-security-policy mailing list
> > dev-security-policy@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-security-policy
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy