Re: Google Trust Services - Minor SCT issue disclosure

2018-08-24 Thread Andy Warner via dev-security-policy
The code at issue evolved as CT requirements changed. What started off as a
very simple conditional grew into a more complex if / else if block with
somewhat complicated logic and inline checks. As part of the fix, we
simplified the conditionals and refactored the inline checks to make use of
nice clear IsExternallyOperated() and IsGoogleOperated() functions. The end
result is a much more readable and clear set of logic that is easier to
test and we expanded test coverage. I think the big lesson for the
community is that it would have been better to have refactored earlier
rather the evolving the code to the point it became more complicated than
it needed to be.

On Thu, Aug 23, 2018 at 9:40 AM Ryan Sleevi  wrote:

>
>
> On Thu, Aug 23, 2018 at 8:50 AM, Andy Warner via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
>>
>> * NOTE: The bug was due to an 'if/else' chain fall through. The code in
>> question has been refactored to be simpler and more readable.
>>
>
> Andy,
>
> It might be good for the community if you could describe the processes
> before and after the change, so that other CAs can help prevent similar
> issues with their own embedding systems.
>


smime.p7s
Description: S/MIME Cryptographic Signature
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google Trust Services - Minor SCT issue disclosure

2018-08-23 Thread Ryan Sleevi via dev-security-policy
On Thu, Aug 23, 2018 at 8:50 AM, Andy Warner via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:
>
> * NOTE: The bug was due to an 'if/else' chain fall through. The code in
> question has been refactored to be simpler and more readable.
>

Andy,

It might be good for the community if you could describe the processes
before and after the change, so that other CAs can help prevent similar
issues with their own embedding systems.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google Trust Services - Minor SCT issue disclosure

2018-08-23 Thread Andy Warner via dev-security-policy
Google provides SCTs via embedding and during SSL handshaking depending on
the certificate and how it is served. In this case, all of the affected
certs used embedded SCTs and the issue was the selection of which SCTs to
include because we submit to more CT logs than required, but only embed the
required number of SCTs to keep cert sizes as small as possible. These
certs were submitted to 4 CT logs, 2 Google, 2 non-Google, but the embedded
certs were only from the 2 Google logs, not one Google and one non-Google.
The CA signed 4 correct SCTs and all 4 were submitted to CT logs, the
problem was the embedding logic for the SCTs.

In response to Q1, the logic involved was specific to selection and
embedding of SCTs, not part of validation logic, so a related error would
not affect validation. An unrelated error in validation logic could of
course affect validation, but all CAs have that risk and like other CAs we
have multiple layers of safeguards on validation logic.

For Q2, we sample certs regularly and make use of proven external linting
libraries and our own linting and audit logic. In this case because the
issue was not something normally checked by external tools and the behavior
was perfectly fine until the Chrome deadline in April, I can only posit
that we would have discovered it fairly quickly via other means. We now
have specific checks for this issue and other similar problems we could
foresee.

For Q3, we could account for the initial submission fully and understand
exactly what happened. Google has rigorous version control and enforcement
systems to ensure only properly reviewed and built code can enter
production and to reconcile running code against the cut point for an
approved release. Our CA systems have additional safeguards on top of those
standard tools to further ensure that we have strong knowledge of the
pedigree of all code and how it was built and deployed.

On Thu, Aug 23, 2018 at 10:55 AM Nick Lamb  wrote:

> On Thu, 23 Aug 2018 05:50:05 -0700 (PDT)
> Andy Warner via dev-security-policy
>  wrote:
>
> > May 21st 2018, a new tool for issuing certificates within Google was
> > made available to internal customers. Within hours we started to
> > receive reports that Chrome Canary (v67) with Certificate
> > Transparency checks enabled was showing warnings. A coding error led
> > to the new tool providing Signed Certificate Timestamps (SCTs) from 2
> > Google CT logs instead of one Google and one non-Google log.
>
> Feel free to jump in anywhere I've made a mistake, this might totally
> invalidate some of my questions.
>
> Presumably, since you eventually "fixed" this by asking Subscribers to
> re-issue, the SCTs are baked into a signed certificate, rather than
> provided separately so that the Subscriber can use them with e.g.
> Stapling technologies ?
>
> Which means that this "new tool" also involved a Google controlled
> subCA signing these certificates with, as it turns out, the wrong SCTs
> in them. It's not clear to me if the tool and CA are operationally one
> and the same.
>
> Q1: Could a more significant "coding error" in this tool have resulted
> in certificates being mis-issued (for example with SANs that don't
> belong to Google, or lacking mandatory X.509 fields, or without being
> CT logged)? If not please explain why the tool couldn't cause this.
>
> Q2: If this error hadn't caused a negative end-user experience, what
> mechanisms if any do you believe would have brought it to your
> attention and how soon? e.g. does a team sample resulting certificates
> from this tool at some interval? If it samples pre-certificates that
> would not have detected this error, but is worth mentioning.
>
> Q3: Such mistakes are of course inevitable in software development. But
> they could also be introduced maliciously. Were you able to confidently
> identify which specific individual(s) made the relevant change? (I don't
> want names). Are you confident you'd be able to do this even if somehow
> the production tool turned out not to match your revision control
> systems?
>
> Thanks as always for satisfying my curiosity
>
> Nick.
>


smime.p7s
Description: S/MIME Cryptographic Signature
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google Trust Services - Minor SCT issue disclosure

2018-08-23 Thread Nick Lamb via dev-security-policy
On Thu, 23 Aug 2018 05:50:05 -0700 (PDT)
Andy Warner via dev-security-policy
 wrote:

> May 21st 2018, a new tool for issuing certificates within Google was
> made available to internal customers. Within hours we started to
> receive reports that Chrome Canary (v67) with Certificate
> Transparency checks enabled was showing warnings. A coding error led
> to the new tool providing Signed Certificate Timestamps (SCTs) from 2
> Google CT logs instead of one Google and one non-Google log. 

Feel free to jump in anywhere I've made a mistake, this might totally
invalidate some of my questions.

Presumably, since you eventually "fixed" this by asking Subscribers to
re-issue, the SCTs are baked into a signed certificate, rather than
provided separately so that the Subscriber can use them with e.g.
Stapling technologies ?

Which means that this "new tool" also involved a Google controlled
subCA signing these certificates with, as it turns out, the wrong SCTs
in them. It's not clear to me if the tool and CA are operationally one
and the same.

Q1: Could a more significant "coding error" in this tool have resulted
in certificates being mis-issued (for example with SANs that don't
belong to Google, or lacking mandatory X.509 fields, or without being
CT logged)? If not please explain why the tool couldn't cause this.

Q2: If this error hadn't caused a negative end-user experience, what
mechanisms if any do you believe would have brought it to your
attention and how soon? e.g. does a team sample resulting certificates
from this tool at some interval? If it samples pre-certificates that
would not have detected this error, but is worth mentioning.

Q3: Such mistakes are of course inevitable in software development. But
they could also be introduced maliciously. Were you able to confidently
identify which specific individual(s) made the relevant change? (I don't
want names). Are you confident you'd be able to do this even if somehow
the production tool turned out not to match your revision control
systems?

Thanks as always for satisfying my curiosity

Nick.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google Trust Services - Minor SCT issue disclosure

2018-08-23 Thread Andy Warner via dev-security-policy
Correct, we do not believe there was a policy violation, we're proactively
sharing in the interest of transparency and knowledge sharing.

I believe there is additional information we could share about how we've
modified testing to ensure compliance with Chrome and Safari's SCT
inclusion rules and have more flexible tests. I want to discuss this with
the engineer who implemented the changes to ensure they agree with how I
would summarize the changes. Update to follow.

On Thu, Aug 23, 2018 at 8:57 AM Alex Gaynor  wrote:

> Hi Andy,
>
> Just so I follow, this is something you're proactively sharing, right? As
> far as I can tell, there's no violation of any Mozilla Root Program rules
> here, just an issue that caused interstitials in Chrome.
>
> Either way, I appreciate your sharing.
>
> You mentioned the issue was do to some overly complex control flow. In
> order to help other CAs out, do you think there are testing methodologies
> that could have helped catch this earlier?
>
> Alex
>
> On Thu, Aug 23, 2018 at 8:50 AM Andy Warner via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
>
>> Please note, Google wrote this report for internal use immediately after
>> the issue. We intended to post it to m.d.s.p at that time, but securing
>> internal approvals took a while and the posting ended-up on the back burner
>> for a bit. It was a minor issue, but we want the community to be aware of
>> it.
>>
>> Summary:
>>
>> May 21st 2018, a new tool for issuing certificates within Google was made
>> available to internal customers. Within hours we started to receive reports
>> that Chrome Canary (v67) with Certificate Transparency checks enabled was
>> showing warnings. A coding error led to the new tool providing Signed
>> Certificate Timestamps (SCTs) from 2 Google CT logs instead of one Google
>> and one non-Google log.
>>
>> * NOTE: Affected certs were logged at issuance to at least 2 Google CT
>> logs and 2 non-Google CT logs. The embedded SCTs for affected certs only
>> provided proofs from Google logs instead of Google and non-Google logs as
>> required by Chrome.
>>
>> * NOTE: The bug was due to an 'if/else' chain fall through. The code in
>> question has been refactored to be simpler and more readable.
>>
>> The issue was fully resolved ~14 hours after initial notification. The
>> issue was mitigated within 4 hours. Triage and code fixes happened within
>> 11 hours and it took ~3 hours to deploy the fixed code and confirm the
>> fixed behavior in production. The new code was running in relatively few
>> locations, so deployment was quick compared to some changes in our
>> infrastructure.
>>
>> Most affected customers responded quickly to communications that they
>> should replace their certificates and revoke the old ones before a given
>> deadline. All certificates that were issued with an SCT set that was not
>> fully compliant were revoked on 2018-06-19 if they had not already been
>> revoked by the customer previously. Most users replaced certificates
>> shortly after notification.
>>
>> Timeline:
>>
>> 2018-03-22 Bug introduced to codebase.
>> 2018-05-21 Push including bug became available to clients.
>> 2018-05-22 08:05 UTC First user reports that Chrome Canary presents a CT
>> warning for a certificate.
>> 2018-05-22 09:25 UTC Bug filed with initial assessment.
>> 2018-05-22 12:01 UTC Frontend jobs with the bug are taken offline
>> following standard CA procedures.
>> 2018-05-22 15:59 UTC Issue conclusively identified.
>> 2018-05-22 19:07 UTC Fix is submitted.
>> 2018-05-22 21:48 UTC Fix starts to be rolled out.
>> 2018-05-22 22:16 UTC Fix fully deployed and tested on test instances
>> followed by deployment to production. Access to frontends restored.
>> 2018-05-24 Customer communication sent to affected users to ask them to
>> renew their certificates and revoke the old ones.
>> 2018-06-19 The final handful of certificates that had not already been
>> revoked and replaced by users were revoked by the CA.
>>
>> Findings:
>>
>> * The operational plan to halt issuance worked as expected and was
>> implemented quickly.
>> * The problem was quickly found, fully understood and easy to remedy.
>> * Tests existed, but did not cover this failure case.
>>
>> Remediation Plan
>> * Completed
>> ** Message of the Day (MOTD) functionality was added or improved for all
>> issuance systems to make it easier to communicate issues to users when
>> issuance is intentionally paused.
>> ** Test coverage was expanded to ensure that both the quantity and type
>> of SCTs are checked.
>> ___
>> dev-security-policy mailing list
>> dev-security-policy@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-security-policy
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google Trust Services - Minor SCT issue disclosure

2018-08-23 Thread Alex Gaynor via dev-security-policy
Hi Andy,

Just so I follow, this is something you're proactively sharing, right? As
far as I can tell, there's no violation of any Mozilla Root Program rules
here, just an issue that caused interstitials in Chrome.

Either way, I appreciate your sharing.

You mentioned the issue was do to some overly complex control flow. In
order to help other CAs out, do you think there are testing methodologies
that could have helped catch this earlier?

Alex

On Thu, Aug 23, 2018 at 8:50 AM Andy Warner via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> Please note, Google wrote this report for internal use immediately after
> the issue. We intended to post it to m.d.s.p at that time, but securing
> internal approvals took a while and the posting ended-up on the back burner
> for a bit. It was a minor issue, but we want the community to be aware of
> it.
>
> Summary:
>
> May 21st 2018, a new tool for issuing certificates within Google was made
> available to internal customers. Within hours we started to receive reports
> that Chrome Canary (v67) with Certificate Transparency checks enabled was
> showing warnings. A coding error led to the new tool providing Signed
> Certificate Timestamps (SCTs) from 2 Google CT logs instead of one Google
> and one non-Google log.
>
> * NOTE: Affected certs were logged at issuance to at least 2 Google CT
> logs and 2 non-Google CT logs. The embedded SCTs for affected certs only
> provided proofs from Google logs instead of Google and non-Google logs as
> required by Chrome.
>
> * NOTE: The bug was due to an 'if/else' chain fall through. The code in
> question has been refactored to be simpler and more readable.
>
> The issue was fully resolved ~14 hours after initial notification. The
> issue was mitigated within 4 hours. Triage and code fixes happened within
> 11 hours and it took ~3 hours to deploy the fixed code and confirm the
> fixed behavior in production. The new code was running in relatively few
> locations, so deployment was quick compared to some changes in our
> infrastructure.
>
> Most affected customers responded quickly to communications that they
> should replace their certificates and revoke the old ones before a given
> deadline. All certificates that were issued with an SCT set that was not
> fully compliant were revoked on 2018-06-19 if they had not already been
> revoked by the customer previously. Most users replaced certificates
> shortly after notification.
>
> Timeline:
>
> 2018-03-22 Bug introduced to codebase.
> 2018-05-21 Push including bug became available to clients.
> 2018-05-22 08:05 UTC First user reports that Chrome Canary presents a CT
> warning for a certificate.
> 2018-05-22 09:25 UTC Bug filed with initial assessment.
> 2018-05-22 12:01 UTC Frontend jobs with the bug are taken offline
> following standard CA procedures.
> 2018-05-22 15:59 UTC Issue conclusively identified.
> 2018-05-22 19:07 UTC Fix is submitted.
> 2018-05-22 21:48 UTC Fix starts to be rolled out.
> 2018-05-22 22:16 UTC Fix fully deployed and tested on test instances
> followed by deployment to production. Access to frontends restored.
> 2018-05-24 Customer communication sent to affected users to ask them to
> renew their certificates and revoke the old ones.
> 2018-06-19 The final handful of certificates that had not already been
> revoked and replaced by users were revoked by the CA.
>
> Findings:
>
> * The operational plan to halt issuance worked as expected and was
> implemented quickly.
> * The problem was quickly found, fully understood and easy to remedy.
> * Tests existed, but did not cover this failure case.
>
> Remediation Plan
> * Completed
> ** Message of the Day (MOTD) functionality was added or improved for all
> issuance systems to make it easier to communicate issues to users when
> issuance is intentionally paused.
> ** Test coverage was expanded to ensure that both the quantity and type of
> SCTs are checked.
> ___
> dev-security-policy mailing list
> dev-security-policy@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Google Trust Services - Minor SCT issue disclosure

2018-08-23 Thread Andy Warner via dev-security-policy
Please note, Google wrote this report for internal use immediately after the 
issue. We intended to post it to m.d.s.p at that time, but securing internal 
approvals took a while and the posting ended-up on the back burner for a bit. 
It was a minor issue, but we want the community to be aware of it.

Summary:

May 21st 2018, a new tool for issuing certificates within Google was made 
available to internal customers. Within hours we started to receive reports 
that Chrome Canary (v67) with Certificate Transparency checks enabled was 
showing warnings. A coding error led to the new tool providing Signed 
Certificate Timestamps (SCTs) from 2 Google CT logs instead of one Google and 
one non-Google log. 

* NOTE: Affected certs were logged at issuance to at least 2 Google CT logs and 
2 non-Google CT logs. The embedded SCTs for affected certs only provided proofs 
from Google logs instead of Google and non-Google logs as required by Chrome.

* NOTE: The bug was due to an 'if/else' chain fall through. The code in 
question has been refactored to be simpler and more readable.

The issue was fully resolved ~14 hours after initial notification. The issue 
was mitigated within 4 hours. Triage and code fixes happened within 11 hours 
and it took ~3 hours to deploy the fixed code and confirm the fixed behavior in 
production. The new code was running in relatively few locations, so deployment 
was quick compared to some changes in our infrastructure.

Most affected customers responded quickly to communications that they should 
replace their certificates and revoke the old ones before a given deadline. All 
certificates that were issued with an SCT set that was not fully compliant were 
revoked on 2018-06-19 if they had not already been revoked by the customer 
previously. Most users replaced certificates shortly after notification.

Timeline:

2018-03-22 Bug introduced to codebase.
2018-05-21 Push including bug became available to clients.
2018-05-22 08:05 UTC First user reports that Chrome Canary presents a CT 
warning for a certificate.
2018-05-22 09:25 UTC Bug filed with initial assessment.
2018-05-22 12:01 UTC Frontend jobs with the bug are taken offline following 
standard CA procedures.
2018-05-22 15:59 UTC Issue conclusively identified.
2018-05-22 19:07 UTC Fix is submitted.
2018-05-22 21:48 UTC Fix starts to be rolled out.
2018-05-22 22:16 UTC Fix fully deployed and tested on test instances followed 
by deployment to production. Access to frontends restored.
2018-05-24 Customer communication sent to affected users to ask them to renew 
their certificates and revoke the old ones.
2018-06-19 The final handful of certificates that had not already been revoked 
and replaced by users were revoked by the CA.

Findings:

* The operational plan to halt issuance worked as expected and was implemented 
quickly.
* The problem was quickly found, fully understood and easy to remedy.
* Tests existed, but did not cover this failure case. 

Remediation Plan
* Completed
** Message of the Day (MOTD) functionality was added or improved for all 
issuance systems to make it easier to communicate issues to users when issuance 
is intentionally paused.
** Test coverage was expanded to ensure that both the quantity and type of SCTs 
are checked.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy