Re: Google Trust Services - CRL handling of expired certificates not fully compliant with RFC 5280 Section 3.3

2019-09-13 Thread Wayne Thayer via dev-security-policy
Thank you for the report and follow-up Andy. I created
https://bugzilla.mozilla.org/show_bug.cgi?id=1581183 to track this issue.

- Wayne

On Fri, Sep 13, 2019 at 10:19 AM Andy Warner via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> A quick follow-up to close this out.
>
> The push to fully address the issue was completed globally shortly before
> 16:00 UTC on 2019-09-02.
>
> After additional review, we're confident the only certificates affected
> were these two:
> https://crt.sh/?id=760396354
> https://crt.sh/?id=759833603
>
> Google Trust Services considers this matter fully addressed. We will of
> course continue our ongoing internal review program, but no other work or
> information is outstanding at this point.
>
> --
> Andy Warner
> Google Trust Services
>
> On Friday, August 30, 2019 at 2:39:51 PM UTC-4, Andy Warner wrote:
> > This is an initial report and we expect to provide some additional
> details and the completion timeline after a bit more verification and full
> deployment of in-flight mitigations. We are posting the most complete
> information we have currently to comply with Mozilla reporting timelines
> and will follow-up with additional details soon.
> >
> > 1. How your CA first became aware of the problem and the time and date.
> >
> > While performing an internal review and assessment of the CRL generation
> system for Google Trust Services' GTS CA 1O1 on August 16, 2019, it was
> discovered that the CRL generation service did not include CRL entries of
> expired certificates. The periodic job only considered certificates with
> valid lifetimes. This does not conform to RFC 5280 Section 3.3 which states
> that “An entry MUST NOT be removed from the CRL until it appears on one
> regularly scheduled CRL issued beyond the revoked certificate's validity
> period.”  We expect that few, if any, clients have been impacted.  For a
> client to be impacted they would have to: clock skewed to a time before the
> not-after field of the certificate; and have a CRL published after
> expiration dropping the revoked certificate.
> >
> >
> > 2. A timeline of the actions your CA took in response. A timeline is a
> date-and-time-stamped sequence of all relevant events. This may include
> events before the incident was reported, such as when a particular
> requirement became applicable, or a document changed, or a bug was
> introduced, or an audit was done.
> >
> > August 16, 2019 15:00 UTC - Reviewer realizes that CRL will not publish
> for one update past expiration
> > August 16, 2019 16:00 UTC - Reviewer checks for other issues & talks to
> peers to confirm problem
> > August 16, 2019 17:00 UTC - Bug is filed to fix the issue with a
> proposed design fix
> > August 16, 2019 23:30 UTC - Fix is sent for review
> > August 20, 2019 16:00 UTC - Remediation work is discussed & assigned
> > August  20, 2019 18:00 UTC - Query to inspect revoked certificates is
> created and sent to be run by production team for initial analysis.
> > August 21, 2019 10:40 UTC - Production team runs query and returns result
> > August 21, 2019 15:00 UTC - Reviewer analyzes data
> > August 21, 2019 20:30 UTC - Reviewer asks for a follow up query to
> ascertain if any certificates did not make it onto the CRL
> > August 22, 2019 07:00 UTC - Initial attempt at updating test systems
> with fix.
> > August 22, 2019 09:00 UTC - Updating of test systems aborted due to
> (unrelated) issues.
> > August 22, 2019 07:00 UTC - Production team runs query for CRLs that may
> have missed a certificate
> > August 22, 2019 15:00 UTC - Reviewer ascertains that certificates under
> question were on a CRL
> > August 26, 2019 11:00 UTC - Second attempt at updating test systems with
> fix.
> > August 26, 2019 13:00 UTC - Test systems updated, confirmed integrity of
> fixed software.
> > August 27, 2019 09:00 UTC - Confirmed fix is effective on test systems.
> > August 27, 2019 10:00 UTC - present: Ongoing staged deployment to
> production systems. Should complete fully by September 3, 2019 17:00 UTC
> (slightly extended window due to push policies around holiday weekends. The
> rollout was staged in accordance with Google's standard rollout procedures.)
> >
> >
> > 3. Whether your CA has stopped, or has not yet stopped, issuing
> certificates with the problem.
> >
> > The affected CA software has been patched.  It now populates expired
> certificates in the CRL for 7 days after their expiration to ensure they
> appear in at least one regularly issued CRL update.  Automated testing was
> added as part of the same patch to check that revoked certificates are kept
> in the CRL.  The patch was developed, tested, reviewed and landed within
> the codebase by August 19, 2019.  The CRL entry removal bug has been fully
> remediated.
> >
> >
> > 4. A summary of the problematic certificates. For each problem: number
> of certs, and the date the first and last certs with that problem were
> issued.
> >
> > Investigation began on 

Re: Google Trust Services - CRL handling of expired certificates not fully compliant with RFC 5280 Section 3.3

2019-09-13 Thread Andy Warner via dev-security-policy
A quick follow-up to close this out.

The push to fully address the issue was completed globally shortly before 16:00 
UTC on 2019-09-02.

After additional review, we're confident the only certificates affected were 
these two:
https://crt.sh/?id=760396354
https://crt.sh/?id=759833603

Google Trust Services considers this matter fully addressed. We will of course 
continue our ongoing internal review program, but no other work or information 
is outstanding at this point.

--
Andy Warner
Google Trust Services

On Friday, August 30, 2019 at 2:39:51 PM UTC-4, Andy Warner wrote:
> This is an initial report and we expect to provide some additional details 
> and the completion timeline after a bit more verification and full deployment 
> of in-flight mitigations. We are posting the most complete information we 
> have currently to comply with Mozilla reporting timelines and will follow-up 
> with additional details soon.
> 
> 1. How your CA first became aware of the problem and the time and date.
> 
> While performing an internal review and assessment of the CRL generation 
> system for Google Trust Services' GTS CA 1O1 on August 16, 2019, it was 
> discovered that the CRL generation service did not include CRL entries of 
> expired certificates. The periodic job only considered certificates with 
> valid lifetimes. This does not conform to RFC 5280 Section 3.3 which states 
> that “An entry MUST NOT be removed from the CRL until it appears on one 
> regularly scheduled CRL issued beyond the revoked certificate's validity 
> period.”  We expect that few, if any, clients have been impacted.  For a 
> client to be impacted they would have to: clock skewed to a time before the 
> not-after field of the certificate; and have a CRL published after expiration 
> dropping the revoked certificate.
> 
> 
> 2. A timeline of the actions your CA took in response. A timeline is a 
> date-and-time-stamped sequence of all relevant events. This may include 
> events before the incident was reported, such as when a particular 
> requirement became applicable, or a document changed, or a bug was 
> introduced, or an audit was done.
> 
> August 16, 2019 15:00 UTC - Reviewer realizes that CRL will not publish for 
> one update past expiration
> August 16, 2019 16:00 UTC - Reviewer checks for other issues & talks to peers 
> to confirm problem
> August 16, 2019 17:00 UTC - Bug is filed to fix the issue with a proposed 
> design fix
> August 16, 2019 23:30 UTC - Fix is sent for review
> August 20, 2019 16:00 UTC - Remediation work is discussed & assigned
> August  20, 2019 18:00 UTC - Query to inspect revoked certificates is created 
> and sent to be run by production team for initial analysis.
> August 21, 2019 10:40 UTC - Production team runs query and returns result
> August 21, 2019 15:00 UTC - Reviewer analyzes data
> August 21, 2019 20:30 UTC - Reviewer asks for a follow up query to ascertain 
> if any certificates did not make it onto the CRL 
> August 22, 2019 07:00 UTC - Initial attempt at updating test systems with fix.
> August 22, 2019 09:00 UTC - Updating of test systems aborted due to 
> (unrelated) issues.
> August 22, 2019 07:00 UTC - Production team runs query for CRLs that may have 
> missed a certificate
> August 22, 2019 15:00 UTC - Reviewer ascertains that certificates under 
> question were on a CRL
> August 26, 2019 11:00 UTC - Second attempt at updating test systems with fix.
> August 26, 2019 13:00 UTC - Test systems updated, confirmed integrity of 
> fixed software.
> August 27, 2019 09:00 UTC - Confirmed fix is effective on test systems.
> August 27, 2019 10:00 UTC - present: Ongoing staged deployment to production 
> systems. Should complete fully by September 3, 2019 17:00 UTC (slightly 
> extended window due to push policies around holiday weekends. The rollout was 
> staged in accordance with Google's standard rollout procedures.)
> 
> 
> 3. Whether your CA has stopped, or has not yet stopped, issuing certificates 
> with the problem. 
> 
> The affected CA software has been patched.  It now populates expired 
> certificates in the CRL for 7 days after their expiration to ensure they 
> appear in at least one regularly issued CRL update.  Automated testing was 
> added as part of the same patch to check that revoked certificates are kept 
> in the CRL.  The patch was developed, tested, reviewed and landed within the 
> codebase by August 19, 2019.  The CRL entry removal bug has been fully 
> remediated.
> 
> 
> 4. A summary of the problematic certificates. For each problem: number of 
> certs, and the date the first and last certs with that problem were issued.
> 
> Investigation began on August 20, 2019 to discover the potential impact of 
> the logic bug. The CRL generation had contained the bug since its inception, 
> affecting all issuance under GTS 1O1 since March 2018. There were 200,263 
> revoked certificates during that time window. Almost all certificates were 
> for internal monitoring 

Google Trust Services - CRL handling of expired certificates not fully compliant with RFC 5280 Section 3.3

2019-08-30 Thread Andy Warner via dev-security-policy
This is an initial report and we expect to provide some additional details and 
the completion timeline after a bit more verification and full deployment of 
in-flight mitigations. We are posting the most complete information we have 
currently to comply with Mozilla reporting timelines and will follow-up with 
additional details soon.

1. How your CA first became aware of the problem and the time and date.

While performing an internal review and assessment of the CRL generation system 
for Google Trust Services' GTS CA 1O1 on August 16, 2019, it was discovered 
that the CRL generation service did not include CRL entries of expired 
certificates. The periodic job only considered certificates with valid 
lifetimes. This does not conform to RFC 5280 Section 3.3 which states that “An 
entry MUST NOT be removed from the CRL until it appears on one regularly 
scheduled CRL issued beyond the revoked certificate's validity period.”  We 
expect that few, if any, clients have been impacted.  For a client to be 
impacted they would have to: clock skewed to a time before the not-after field 
of the certificate; and have a CRL published after expiration dropping the 
revoked certificate.


2. A timeline of the actions your CA took in response. A timeline is a 
date-and-time-stamped sequence of all relevant events. This may include events 
before the incident was reported, such as when a particular requirement became 
applicable, or a document changed, or a bug was introduced, or an audit was 
done.

August 16, 2019 15:00 UTC - Reviewer realizes that CRL will not publish for one 
update past expiration
August 16, 2019 16:00 UTC - Reviewer checks for other issues & talks to peers 
to confirm problem
August 16, 2019 17:00 UTC - Bug is filed to fix the issue with a proposed 
design fix
August 16, 2019 23:30 UTC - Fix is sent for review
August 20, 2019 16:00 UTC - Remediation work is discussed & assigned
August  20, 2019 18:00 UTC - Query to inspect revoked certificates is created 
and sent to be run by production team for initial analysis.
August 21, 2019 10:40 UTC - Production team runs query and returns result
August 21, 2019 15:00 UTC - Reviewer analyzes data
August 21, 2019 20:30 UTC - Reviewer asks for a follow up query to ascertain if 
any certificates did not make it onto the CRL 
August 22, 2019 07:00 UTC - Initial attempt at updating test systems with fix.
August 22, 2019 09:00 UTC - Updating of test systems aborted due to (unrelated) 
issues.
August 22, 2019 07:00 UTC - Production team runs query for CRLs that may have 
missed a certificate
August 22, 2019 15:00 UTC - Reviewer ascertains that certificates under 
question were on a CRL
August 26, 2019 11:00 UTC - Second attempt at updating test systems with fix.
August 26, 2019 13:00 UTC - Test systems updated, confirmed integrity of fixed 
software.
August 27, 2019 09:00 UTC - Confirmed fix is effective on test systems.
August 27, 2019 10:00 UTC - present: Ongoing staged deployment to production 
systems. Should complete fully by September 3, 2019 17:00 UTC (slightly 
extended window due to push policies around holiday weekends. The rollout was 
staged in accordance with Google's standard rollout procedures.)


3. Whether your CA has stopped, or has not yet stopped, issuing certificates 
with the problem. 

The affected CA software has been patched.  It now populates expired 
certificates in the CRL for 7 days after their expiration to ensure they appear 
in at least one regularly issued CRL update.  Automated testing was added as 
part of the same patch to check that revoked certificates are kept in the CRL.  
The patch was developed, tested, reviewed and landed within the codebase by 
August 19, 2019.  The CRL entry removal bug has been fully remediated.


4. A summary of the problematic certificates. For each problem: number of 
certs, and the date the first and last certs with that problem were issued.

Investigation began on August 20, 2019 to discover the potential impact of the 
logic bug. The CRL generation had contained the bug since its inception, 
affecting all issuance under GTS 1O1 since March 2018. There were 200,263 
revoked certificates during that time window. Almost all certificates were for 
internal monitoring specific to checking revocation. The few non-monitoring 
certificates were all revocations by clients following rotation of certificates 
and not due to compromises.


5. The complete certificate data for the problematic certificates. The 
recommended way to provide this is to ensure each certificate is logged to CT 
and then list the fingerprints or crt.sh IDs, either in the report or as an 
attached spreadsheet, with one list per distinct problem.

crt.sh IDs to follow, waiting on confirmation that the 2 test certificates 
mentioned below are the only cases where the issue was surfaced.

The team looked for revoked certificates from first issuance that never 
appeared within a published CRL from operation of CA until August 21,