Re: Google OCSP service down

2018-02-25 Thread Ryan Hurst via dev-security-policy
Tim,

I can see value in a ballot on how to clarify incident reporting and other
contact related issues, right now 1.5.2 is pretty sparse in regards to how
to handle this. I would be happy to work with you on a proposal here.

Ryan

On Sun, Feb 25, 2018 at 6:41 AM, Tim Hollebeek <tim.holleb...@digicert.com>
wrote:

> Ryan,
>
> Wayne and I have been discussing making various improvements to 1.5.2
> mandatory for all CAs.  I've made a few improvements to DigiCert's CPSs in
> this area, but things probably still could be better.  There will probably
> be
> a CA/B ballot in this area soon.
>
> DigiCert's 1.5.2 has our support email address, and our Certificate Problem
> Report email (which I recently added).  That doesn't really cover
> everything
> (yet).
>
> It looks like GTS 1.5.2 splits things into security (including CPRs),
> non-security
> requests.
>
> I didn't chase down any other 1.5.2's yet, but it'd be interesting to hear
> what
> other CAs have here.  I suspect most only have one address for everything.
>
> Something to keep in mind once the CA/B thread shows up.
>
> -Tim
>
> > -Original Message-
> > From: dev-security-policy [mailto:dev-security-policy-
> > bounces+tim.hollebeek=digicert@lists.mozilla.org] On Behalf Of Ryan
> > Hurst via dev-security-policy
> > Sent: Wednesday, February 21, 2018 9:53 PM
> > To: mozilla-dev-security-pol...@lists.mozilla.org
> > Subject: Re: Google OCSP service down
> >
> > I wanted to follow up with our findings and a summary of this issue for
> the
> > community.
> >
> > Bellow you will see a detail on what happened and how we resolved the
> issue,
> > hopefully this will help explain what hapened and potentially others not
> > encounter a similar issue.
> >
> > Summary
> > ---
> > January 19th, at 08:40 UTC, a code push to improve OCSP generation for a
> > subset of the Google operated Certificate Authorities was initiated. The
> change
> > was related to the packaging of generated OCSP responses. The first time
> this
> > change was invoked in production was January 19th at 16:40 UTC.
> >
> > NOTE: The publication of new revocation information to all geographies
> can
> > take up to 6 hours to propagate. Additionally, clients and middle-boxes
> > commonly implement caching behavior. This results in a large window where
> > clients may have begun to observe the outage.
> >
> > NOTE: Most modern web browsers “soft-fail” in response to OCSP server
> > availability issues, masking outages. Firefox, however, supports an
> advanced
> > option that allows users to opt-in to “hard-fail” behavior for revocation
> > checking. An unknown percentage of Firefox users enable this setting. We
> > believe most users who were impacted by the outage were these Firefox
> users.
> >
> > About 9 hours after the deployment of the change began (2018-01-20 01:36
> > UTC) a user on Twitter mentions that they were having problems with their
> > hard-fail OCSP checking configuration in Firefox when visiting Google
> > properties. This tweet and the few that followed during the outage
> period were
> > not noticed by any Google employees until after the incident’s
> post-mortem
> > investigation had begun.
> >
> > About 1 day and 22 hours after the push was initiated (2018-01-21 15:07
> UTC),
> > a user posted a message to the mozilla.dev.security.policy mailing list
> where
> > they mention they too are having problems with their hard-fail
> configuration in
> > Firefox when visiting Google properties.
> >
> > About two days after the push was initiated, a Google employee
> discovered the
> > post and opened a ticket (2018-01-21 16:10 UTC). This triggered the
> > remediation procedures, which began in under an hour.
> >
> > The issue was resolved about 2 days and 6 hours from the time it was
> > introduced (2018-01-21 22:56 UTC). Once Google became aware of the
> issue, it
> > took 1 hour and 55 minutes to resolve the issue, and an additional 4
> hours and
> > 51 minutes for the fix to be completely deployed.
> >
> > No customer reports regarding this issue were sent to the notification
> > addresses listed in Google's CPSs or on the repository websites for the
> duration
> > of the outage. This extended the duration of the outage.
> >
> > Background
> > --
> > Google's OCSP Infrastructure works by generating OCSP responses in
> batches,
> > with each batch being made up of the certificates issued by an
> individual CA.
> >
> > In the case of GI

RE: Google OCSP service down

2018-02-25 Thread Tim Hollebeek via dev-security-policy
Ryan,

Wayne and I have been discussing making various improvements to 1.5.2
mandatory for all CAs.  I've made a few improvements to DigiCert's CPSs in
this area, but things probably still could be better.  There will probably be
a CA/B ballot in this area soon.

DigiCert's 1.5.2 has our support email address, and our Certificate Problem 
Report email (which I recently added).  That doesn't really cover everything 
(yet).

It looks like GTS 1.5.2 splits things into security (including CPRs), 
non-security
requests.

I didn't chase down any other 1.5.2's yet, but it'd be interesting to hear what
other CAs have here.  I suspect most only have one address for everything.

Something to keep in mind once the CA/B thread shows up.

-Tim

> -Original Message-
> From: dev-security-policy [mailto:dev-security-policy-
> bounces+tim.hollebeek=digicert@lists.mozilla.org] On Behalf Of Ryan
> Hurst via dev-security-policy
> Sent: Wednesday, February 21, 2018 9:53 PM
> To: mozilla-dev-security-pol...@lists.mozilla.org
> Subject: Re: Google OCSP service down
> 
> I wanted to follow up with our findings and a summary of this issue for the
> community.
> 
> Bellow you will see a detail on what happened and how we resolved the issue,
> hopefully this will help explain what hapened and potentially others not
> encounter a similar issue.
> 
> Summary
> ---
> January 19th, at 08:40 UTC, a code push to improve OCSP generation for a
> subset of the Google operated Certificate Authorities was initiated. The 
> change
> was related to the packaging of generated OCSP responses. The first time this
> change was invoked in production was January 19th at 16:40 UTC.
> 
> NOTE: The publication of new revocation information to all geographies can
> take up to 6 hours to propagate. Additionally, clients and middle-boxes
> commonly implement caching behavior. This results in a large window where
> clients may have begun to observe the outage.
> 
> NOTE: Most modern web browsers “soft-fail” in response to OCSP server
> availability issues, masking outages. Firefox, however, supports an advanced
> option that allows users to opt-in to “hard-fail” behavior for revocation
> checking. An unknown percentage of Firefox users enable this setting. We
> believe most users who were impacted by the outage were these Firefox users.
> 
> About 9 hours after the deployment of the change began (2018-01-20 01:36
> UTC) a user on Twitter mentions that they were having problems with their
> hard-fail OCSP checking configuration in Firefox when visiting Google
> properties. This tweet and the few that followed during the outage period were
> not noticed by any Google employees until after the incident’s post-mortem
> investigation had begun.
> 
> About 1 day and 22 hours after the push was initiated (2018-01-21 15:07 UTC),
> a user posted a message to the mozilla.dev.security.policy mailing list where
> they mention they too are having problems with their hard-fail configuration 
> in
> Firefox when visiting Google properties.
> 
> About two days after the push was initiated, a Google employee discovered the
> post and opened a ticket (2018-01-21 16:10 UTC). This triggered the
> remediation procedures, which began in under an hour.
> 
> The issue was resolved about 2 days and 6 hours from the time it was
> introduced (2018-01-21 22:56 UTC). Once Google became aware of the issue, it
> took 1 hour and 55 minutes to resolve the issue, and an additional 4 hours and
> 51 minutes for the fix to be completely deployed.
> 
> No customer reports regarding this issue were sent to the notification
> addresses listed in Google's CPSs or on the repository websites for the 
> duration
> of the outage. This extended the duration of the outage.
> 
> Background
> --
> Google's OCSP Infrastructure works by generating OCSP responses in batches,
> with each batch being made up of the certificates issued by an individual CA.
> 
> In the case of GIAG2, this batch is produced in chunks of certificates issued 
> in
> the last 370 days. For each chunk, the GIAG2 CA is asked to produce the
> corresponding OCSP responses, the results of which are placed into a separate
> .tar file.
> 
> The issuer of GIAG2 has chosen to issue new certificates to GIAG2 
> periodically,
> as a result GIAG2 has multiple certificates. Two of these certificates no 
> longer
> have unexpired certificates associated with them. As a result, and as 
> expected,
> the CA does not produce responses for the corresponding periods.
> 
> All .tar files produced during this process are then concatenated with the -
> concatenate command in GNU tar. This produces a single .tar file containing 
> all
> of the OCSP responses for the given Certificate 

Re: Google OCSP service down

2018-02-21 Thread Paul Kehrer via dev-security-policy
Thank you for this comprehensive incident report Ryan. Your team's decision
to improve the documentation around the right address for reporting is
great to see! I wonder if it might also make sense to pull the contact
information directly on https://pki.goog above the fold?

-Paul (reaperhulk)

On February 22, 2018 at 12:53:32 PM, Ryan Hurst via dev-security-policy (
dev-security-policy@lists.mozilla.org) wrote:

I wanted to follow up with our findings and a summary of this issue for the
community.

Bellow you will see a detail on what happened and how we resolved the
issue, hopefully this will help explain what hapened and potentially others
not encounter a similar issue.

Summary
---
January 19th, at 08:40 UTC, a code push to improve OCSP generation for a
subset of the Google operated Certificate Authorities was initiated. The
change was related to the packaging of generated OCSP responses. The first
time this change was invoked in production was January 19th at 16:40 UTC.

NOTE: The publication of new revocation information to all geographies can
take up to 6 hours to propagate. Additionally, clients and middle-boxes
commonly implement caching behavior. This results in a large window where
clients may have begun to observe the outage.

NOTE: Most modern web browsers “soft-fail” in response to OCSP server
availability issues, masking outages. Firefox, however, supports an
advanced option that allows users to opt-in to “hard-fail” behavior for
revocation checking. An unknown percentage of Firefox users enable this
setting. We believe most users who were impacted by the outage were these
Firefox users.

About 9 hours after the deployment of the change began (2018-01-20 01:36
UTC) a user on Twitter mentions that they were having problems with their
hard-fail OCSP checking configuration in Firefox when visiting Google
properties. This tweet and the few that followed during the outage period
were not noticed by any Google employees until after the incident’s
post-mortem investigation had begun.

About 1 day and 22 hours after the push was initiated (2018-01-21 15:07
UTC), a user posted a message to the mozilla.dev.security.policy mailing
list where they mention they too are having problems with their hard-fail
configuration in Firefox when visiting Google properties.

About two days after the push was initiated, a Google employee discovered
the post and opened a ticket (2018-01-21 16:10 UTC). This triggered the
remediation procedures, which began in under an hour.

The issue was resolved about 2 days and 6 hours from the time it was
introduced (2018-01-21 22:56 UTC). Once Google became aware of the issue,
it took 1 hour and 55 minutes to resolve the issue, and an additional 4
hours and 51 minutes for the fix to be completely deployed.

No customer reports regarding this issue were sent to the notification
addresses listed in Google's CPSs or on the repository websites for the
duration of the outage. This extended the duration of the outage.

Background
--
Google's OCSP Infrastructure works by generating OCSP responses in batches,
with each batch being made up of the certificates issued by an individual
CA.

In the case of GIAG2, this batch is produced in chunks of certificates
issued in the last 370 days. For each chunk, the GIAG2 CA is asked to
produce the corresponding OCSP responses, the results of which are placed
into a separate .tar file.

The issuer of GIAG2 has chosen to issue new certificates to GIAG2
periodically, as a result GIAG2 has multiple certificates. Two of these
certificates no longer have unexpired certificates associated with them. As
a result, and as expected, the CA does not produce responses for the
corresponding periods.

All .tar files produced during this process are then concatenated with the
-concatenate command in GNU tar. This produces a single .tar file
containing all of the OCSP responses for the given Certificate Authority,
then this .tar file is distributed to our global CDN infrastructure for
serving.

A change was made in how we batch these responses, specifically instead of
outputting many .tar files within a batch, a concatenation was of all tar
files was produced.

The change in question triggered an unexpected behaviour in GNU tar which
then manifested as an empty tarball. These "empty" updates ended up being
distributed to our global CDN, effectively dropping some responses, while
continuing to serve responses for other CAs.

During testing of the change, this behaviour was not detected, as the tests
did not cover the scenario in which some chunks did not contain unexpired
certificates.

Findings

- The outage only impacted sites with TLS certificates issued by the GIAG2
CA as it was the only CA that met the required pre-conditions of the bug.
- The bug that introduced this failure manifested itself as an empty
container of OCSP responses. The root cause of the issue was an unexpected
behavior of GNU tar relating to concatenating tar files.
- The 

Re: Google OCSP service down

2018-02-21 Thread Ryan Hurst via dev-security-policy
I wanted to follow up with our findings and a summary of this issue for the 
community. 

Bellow you will see a detail on what happened and how we resolved the issue, 
hopefully this will help explain what hapened and potentially others not 
encounter a similar issue.

Summary
---
January 19th, at 08:40 UTC, a code push to improve OCSP generation for a subset 
of the Google operated Certificate Authorities was initiated. The change was 
related to the packaging of generated OCSP responses. The first time this 
change was invoked in production was January 19th at 16:40 UTC. 

NOTE: The publication of new revocation information to all geographies can take 
up to 6 hours to propagate. Additionally, clients and middle-boxes commonly 
implement caching behavior. This results in a large window where clients may 
have begun to observe the outage.

NOTE: Most modern web browsers “soft-fail” in response to OCSP server 
availability issues, masking outages. Firefox, however, supports an advanced 
option that allows users to opt-in to “hard-fail” behavior for revocation 
checking. An unknown percentage of Firefox users enable this setting. We 
believe most users who were impacted by the outage were these Firefox users.

About 9 hours after the deployment of the change began (2018-01-20 01:36 UTC) a 
user on Twitter mentions that they were having problems with their hard-fail 
OCSP checking configuration in Firefox when visiting Google properties. This 
tweet and the few that followed during the outage period were not noticed by 
any Google employees until after the incident’s post-mortem investigation had 
begun. 

About 1 day and 22 hours after the push was initiated (2018-01-21 15:07 UTC), a 
user posted a message to the mozilla.dev.security.policy mailing list where 
they mention they too are having problems with their hard-fail configuration in 
Firefox when visiting Google properties.

About two days after the push was initiated, a Google employee discovered the 
post and opened a ticket (2018-01-21 16:10 UTC). This triggered the remediation 
procedures, which began in under an hour.

The issue was resolved about 2 days and 6 hours from the time it was introduced 
(2018-01-21 22:56 UTC). Once Google became aware of the issue, it took 1 hour 
and 55 minutes to resolve the issue, and an additional 4 hours and 51 minutes 
for the fix to be completely deployed.

No customer reports regarding this issue were sent to the notification 
addresses listed in Google's CPSs or on the repository websites for the 
duration of the outage. This extended the duration of the outage. 

Background
--
Google's OCSP Infrastructure works by generating OCSP responses in batches, 
with each batch being made up of the certificates issued by an individual CA.

In the case of GIAG2, this batch is produced in chunks of certificates issued 
in the last 370 days. For each chunk, the GIAG2 CA is asked to produce the 
corresponding OCSP responses, the results of which are placed into a separate 
.tar file.

The issuer of GIAG2 has chosen to issue new certificates to GIAG2 periodically, 
as a result GIAG2 has multiple certificates. Two of these certificates no 
longer have unexpired certificates associated with them. As a result, and as 
expected, the CA does not produce responses for the corresponding periods.

All .tar files produced during this process are then concatenated with the 
-concatenate command in GNU tar. This produces a single .tar file containing 
all of the OCSP responses for the given Certificate Authority, then this .tar 
file is distributed to our global CDN infrastructure for serving.

A change was made in how we batch these responses, specifically instead of 
outputting many .tar files within a batch, a concatenation was of all tar files 
was produced.

The change in question triggered an unexpected behaviour in GNU tar which then 
manifested as an empty tarball. These "empty" updates ended up being 
distributed to our global CDN, effectively dropping some responses, while 
continuing to serve responses for other CAs.

During testing of the change, this behaviour was not detected, as the tests did 
not cover the scenario in which some chunks did not contain unexpired 
certificates.

Findings

- The outage only impacted sites with TLS certificates issued by the GIAG2 CA 
as it was the only CA that met the required pre-conditions of the bug. 
- The bug that introduced this failure manifested itself as an empty container 
of OCSP responses. The root cause of the issue was an unexpected behavior of 
GNU tar relating to concatenating tar files.
- The outage was observed by revocation service monitoring as  “unknown 
certificate” (HTTP 404) errors. HTTP 404 errors are expected in OCSP responder 
operations; they typically are the result of poorly configured clients. These 
events are monitored and a threshold does exist for an on-call escalation.
- Due to a configuration error the designated Google team did 

Re: Google OCSP service down

2018-01-22 Thread Moudrick M. Dadashov via dev-security-policy

Hi Wayne,

This is how its supposed to work under eIDAS:

1. Check the value of the QCStatement [1] of the certificate under 
problem (which is the location of PDS);

2. Open the PDS and check relevant contact info as in [2].

Thanks,
M.D.

[1] see 4.3.4 (QCStatement regarding location of PKI Disclosure 
Statements (PDS)) in ETSI EN 319 412-5;

[2] see Annex 1 (Model PKI disclosure statement) in ETSI EN 319 411-1.


On 1/22/2018 10:07 PM, Wayne Thayer via dev-security-policy wrote:

On Sun, Jan 21, 2018 at 2:14 PM, Ryan Sleevi via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:


I think the whole CA incident reporting question has lots of room for
improvement. And I think this should be considered in a way that people
who are not familiar with the details of the CA ecosystem can
successfully report incidents. I.e. saying "you can find all the
contact info in our CPS" is not particularly helpful, as nobody outside
a small circle of people knows what that is.

Even if a relying party looks for the problem reporting mechanism in the
CPS, they are unlikely to find it. The only requirement is "The CA SHALL
publicly disclose the instructions through a readily accessible online
means" in BR 9.4.3. From my observations, many CAs do not place this in
their CPS, and almost none equate the requirement to "easy to find".


I think if people try the "natural" way of contacting a certificate

issuing entity this should lead to a successful outcome. (And that is
more or less "This has been issued by X, so I try to contact X".)

The "natural" way is likely to be some generic support email address that

receives thousands of emails a day and is subject to the problems Ryan
describes below. Maintaining a 24-hour response time for any email address
a relying party might find is not compatible with the requirement for a
timely response.


To be honest, I think I find myself agreeing with other CAs when I question
whether that should be or necessarily is a goal.

If you’ve been on an inbound bug queue for virtually any product
(particularly popular ones), you will be amazed at the (lack of) quality
reports. Just search the Mozilla or Chromium bug trackers for “my computer
has been hacked” to see a variety of bugs from people most likely suffering
from one or more mental disorders, unfortunately, to see how bad it can be.

Add to that the complexity of PKI, and the contractual obligations of
responsivess, and t becomes quite different. Talk to existing CAs that
provided email links to problem reporting mechanisms (prior to Mozilla’s
requiring they do so) and hear about the spam. I know of problem reports
from Google to other CAs that have similarly been caught by the spam
filters designed to ensure high signal.

Combined with the spectrum of technical acumen we see, even here, or
through contributions from Interested Parties to the CA/Browser Forum, and
I suspect that highlighting even more the reporting mechanism is to vastly
increase the noise, rather than the signal, and thus do more harm than
good.

Normalizing problem reporting - meaning that reporters have to do more work
to align their reports into actionable data - conversely increases the
barrier to submission but reduces the barrier to action. Is it an equitable
tradeoff? It may be.

Something to ponder, however, as easier does not necessarily mean better.

This is a good point, but easier doesn't necessarily mean worse either. I

propose that we add a requirement that makes the reporting mechanism more
consistent and easier to find (e.g. clearly labeled so that a search for
"google CA problem report" gets me there). Allow the reporting mechanism to
be flexible so that a CA can, for example, use a form with a captcha to
collect the report. I don't know if we need to specify "better" by
normalizing the mechanism or information that is gathered, but I'm also not
opposed.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-22 Thread Wayne Thayer via dev-security-policy
On Sun, Jan 21, 2018 at 2:14 PM, Ryan Sleevi via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

>
> > I think the whole CA incident reporting question has lots of room for
> > improvement. And I think this should be considered in a way that people
> > who are not familiar with the details of the CA ecosystem can
> > successfully report incidents. I.e. saying "you can find all the
> > contact info in our CPS" is not particularly helpful, as nobody outside
> > a small circle of people knows what that is.
>

Even if a relying party looks for the problem reporting mechanism in the
CPS, they are unlikely to find it. The only requirement is "The CA SHALL
publicly disclose the instructions through a readily accessible online
means" in BR 9.4.3. From my observations, many CAs do not place this in
their CPS, and almost none equate the requirement to "easy to find".

> I think if people try the "natural" way of contacting a certificate
> > issuing entity this should lead to a successful outcome. (And that is
> > more or less "This has been issued by X, so I try to contact X".)
>
> The "natural" way is likely to be some generic support email address that
receives thousands of emails a day and is subject to the problems Ryan
describes below. Maintaining a 24-hour response time for any email address
a relying party might find is not compatible with the requirement for a
timely response.

>
> To be honest, I think I find myself agreeing with other CAs when I question
> whether that should be or necessarily is a goal.
>
> If you’ve been on an inbound bug queue for virtually any product
> (particularly popular ones), you will be amazed at the (lack of) quality
> reports. Just search the Mozilla or Chromium bug trackers for “my computer
> has been hacked” to see a variety of bugs from people most likely suffering
> from one or more mental disorders, unfortunately, to see how bad it can be.
>
> Add to that the complexity of PKI, and the contractual obligations of
> responsivess, and t becomes quite different. Talk to existing CAs that
> provided email links to problem reporting mechanisms (prior to Mozilla’s
> requiring they do so) and hear about the spam. I know of problem reports
> from Google to other CAs that have similarly been caught by the spam
> filters designed to ensure high signal.
>
> Combined with the spectrum of technical acumen we see, even here, or
> through contributions from Interested Parties to the CA/Browser Forum, and
> I suspect that highlighting even more the reporting mechanism is to vastly
> increase the noise, rather than the signal, and thus do more harm than
> good.
>
> Normalizing problem reporting - meaning that reporters have to do more work
> to align their reports into actionable data - conversely increases the
> barrier to submission but reduces the barrier to action. Is it an equitable
> tradeoff? It may be.
>
> Something to ponder, however, as easier does not necessarily mean better.
>
> This is a good point, but easier doesn't necessarily mean worse either. I
propose that we add a requirement that makes the reporting mechanism more
consistent and easier to find (e.g. clearly labeled so that a search for
"google CA problem report" gets me there). Allow the reporting mechanism to
be flexible so that a CA can, for example, use a form with a captcha to
collect the report. I don't know if we need to specify "better" by
normalizing the mechanism or information that is gathered, but I'm also not
opposed.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-22 Thread Ryan Hurst via dev-security-policy
On Monday, January 22, 2018 at 1:26:01 AM UTC-8, ihave...@gmail.com wrote:
> Hi,
> 
> Just as an FYI, I am still getting 404. My geographic location is UAE if that 
> helps at all.
> 
> My openssl command:
> openssl ocsp -issuer gtsx1.pem -cert goodr1demopkigoog.crt -url 
> http://ocsp.pki.goog/GTSGIAG3  -CAfile gtsrootr1.pem 
> Error querying OCSP responder
> 77317:error:27075072:OCSP routines:PARSE_HTTP_LINE1:server response 
> error:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-59.60.1/src/crypto/ocsp/ocsp_ht.c:224:Code=404,Reason=Not
>  Found

Tham,

It seems you are not specifying the hostname header which is required by HTTP 
1.1 which is required by RFC 2560:

Here is what a command for that root would look like:
openssl ocsp -issuer r1goodissuer.cer -cert r1good.cer -no_nonce -text -url 
"http://ocsp.pki.goog/GTSGIAG3; -header host ocsp.pki.goog

Ryan
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-22 Thread ihavesmime--- via dev-security-policy
Hi,

Just as an FYI, I am still getting 404. My geographic location is UAE if that 
helps at all.

My openssl command:
openssl ocsp -issuer gtsx1.pem -cert goodr1demopkigoog.crt -url 
http://ocsp.pki.goog/GTSGIAG3  -CAfile gtsrootr1.pem 
Error querying OCSP responder
77317:error:27075072:OCSP routines:PARSE_HTTP_LINE1:server response 
error:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-59.60.1/src/crypto/ocsp/ocsp_ht.c:224:Code=404,Reason=Not
 Found

Kind regards,
Tham Wickenberg

On Monday, January 22, 2018 at 3:01:46 AM UTC+4, Ryan Hurst wrote:
> On Sunday, January 21, 2018 at 1:42:59 PM UTC-8, Ryan Hurst wrote:
> > On Sunday, January 21, 2018 at 1:29:58 PM UTC-8, s...@gmx.ch wrote:
> > > Hi
> > > 
> > > Thanks for investigating.
> > > 
> > > I can confirm that the service is now working again for me most of the
> > > time, but some queries still fail (may be due load balancing in the
> > > backend?).
> > > 
> > 
> > Thank you for your report and confirming you are seeing things starting to 
> > work.
> > 
> > Google operates a global network utilizing many redundant servers and the 
> > nature of the way that works is one connection to the next you may be 
> > hitting a different cluster of servers. 
> > 
> > It can take a while for all of these different clusters to receive the 
> > associated updates.
> > 
> > This would explain your inconsistent results.
> > 
> > I am actively watching this deployment to ensure it completes successfully 
> > but at this point, it seems all will continue to roll out as expected.
> > 
> > As an aside, We are still continuing our post-mortem.
> 
> The issue should be 100% resolved now.
> 
> As per earlier posts, we will complete the post-mortem and report to the 
> community with our findings.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Hurst via dev-security-policy
On Sunday, January 21, 2018 at 1:42:59 PM UTC-8, Ryan Hurst wrote:
> On Sunday, January 21, 2018 at 1:29:58 PM UTC-8, s...@gmx.ch wrote:
> > Hi
> > 
> > Thanks for investigating.
> > 
> > I can confirm that the service is now working again for me most of the
> > time, but some queries still fail (may be due load balancing in the
> > backend?).
> > 
> 
> Thank you for your report and confirming you are seeing things starting to 
> work.
> 
> Google operates a global network utilizing many redundant servers and the 
> nature of the way that works is one connection to the next you may be hitting 
> a different cluster of servers. 
> 
> It can take a while for all of these different clusters to receive the 
> associated updates.
> 
> This would explain your inconsistent results.
> 
> I am actively watching this deployment to ensure it completes successfully 
> but at this point, it seems all will continue to roll out as expected.
> 
> As an aside, We are still continuing our post-mortem.

The issue should be 100% resolved now.

As per earlier posts, we will complete the post-mortem and report to the 
community with our findings.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Hurst via dev-security-policy
On Sunday, January 21, 2018 at 1:29:58 PM UTC-8, s...@gmx.ch wrote:
> Hi
> 
> Thanks for investigating.
> 
> I can confirm that the service is now working again for me most of the
> time, but some queries still fail (may be due load balancing in the
> backend?).
> 

Thank you for your report and confirming you are seeing things starting to work.

Google operates a global network utilizing many redundant servers and the 
nature of the way that works is one connection to the next you may be hitting a 
different cluster of servers. 

It can take a while for all of these different clusters to receive the 
associated updates.

This would explain your inconsistent results.

I am actively watching this deployment to ensure it completes successfully but 
at this point, it seems all will continue to roll out as expected.

As an aside, We are still continuing our post-mortem.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread sjw--- via dev-security-policy
Hi

Thanks for investigating.

First of all, my previously curl command is not suitable to verify a
OCSP status. It only works for OCSP stapling which is not supported by
Google servers.
You may use openssl ocsp instead:
openssl ocsp -issuer [GoogleInternetAuthorityG2.crt] -cert
[googlecom.crt] -url http://clients1.google.com/ocsp -resp_text -header
HOST=clients1.google.com

I can confirm that the service is now working again for me most of the
time, but some queries still fail (may be due load balancing in the
backend?).


Am 21.01.2018 um 22:00 schrieb Hanno Böck via dev-security-policy:
> If I goole for that I end up at https://pki.google.com/ This page has
> a similar style as the pki.goog, but notably it doesn't list any
> contact info. It has an FAQ, but that doesn't have any question of the
> form "How do I report a problem with your CA?" The only thing that
> might be helpful is a pointer to report security incidents. I'd
> probably have done that, though I would be unsure, as it's debatable
> whether an offline OCSP counts as a security issue.

I ended up with the same situation. But "OCSP is down" does not fit in
any category on the vulnerability report site and the cartegory "other"
does only provide support articles.



signature.asc
Description: OpenPGP digital signature
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Sleevi via dev-security-policy
On Sun, Jan 21, 2018 at 4:00 PM Hanno Böck via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> Hi,
>
> On Sun, 21 Jan 2018 12:09:23 -0800 (PST)
> Ryan Hurst via dev-security-policy
>  wrote:
>
> > We maintain contact details both within our CPS (like other CAs) and
> > at https://pki.goog so that people can reach us expeditiously. In the
> > future if anyone needs to reach us please use those details.
>
> I just tried to see what I'd do if I wanted to report issues with
> Google's CA (assuming I don't know where its webpage lives and assuming
> I don't know any Googlers to report this directly).
>
> When I look into the cert details the certificates for Google webpages
> are issued by
> "Google Internet Authority G2"
>
> If I goole for that I end up at
> https://pki.google.com/
>
> This page has a similar style as the pki.goog, but notably it doesn't
> list any contact info. It has an FAQ, but that doesn't have any
> question of the form "How do I report a problem with your CA?"
>
> The only thing that might be helpful is a pointer to report security
> incidents. I'd probably have done that, though I would be unsure, as
> it's debatable whether an offline OCSP counts as a security issue.
>
>
> Meta-comment:
>
> I think the whole CA incident reporting question has lots of room for
> improvement. And I think this should be considered in a way that people
> who are not familiar with the details of the CA ecosystem can
> successfully report incidents. I.e. saying "you can find all the
> contact info in our CPS" is not particularly helpful, as nobody outside
> a small circle of people knows what that is.
> I think if people try the "natural" way of contacting a certificate
> issuing entity this should lead to a successful outcome. (And that is
> more or less "This has been issued by X, so I try to contact X".)


To be honest, I think I find myself agreeing with other CAs when I question
whether that should be or necessarily is a goal.

If you’ve been on an inbound bug queue for virtually any product
(particularly popular ones), you will be amazed at the (lack of) quality
reports. Just search the Mozilla or Chromium bug trackers for “my computer
has been hacked” to see a variety of bugs from people most likely suffering
from one or more mental disorders, unfortunately, to see how bad it can be.

Add to that the complexity of PKI, and the contractual obligations of
responsivess, and t becomes quite different. Talk to existing CAs that
provided email links to problem reporting mechanisms (prior to Mozilla’s
requiring they do so) and hear about the spam. I know of problem reports
from Google to other CAs that have similarly been caught by the spam
filters designed to ensure high signal.

Combined with the spectrum of technical acumen we see, even here, or
through contributions from Interested Parties to the CA/Browser Forum, and
I suspect that highlighting even more the reporting mechanism is to vastly
increase the noise, rather than the signal, and thus do more harm than good.

Normalizing problem reporting - meaning that reporters have to do more work
to align their reports into actionable data - conversely increases the
barrier to submission but reduces the barrier to action. Is it an equitable
tradeoff? It may be.

Something to ponder, however, as easier does not necessarily mean better.

>
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Sleevi via dev-security-policy
On Sun, Jan 21, 2018 at 2:08 PM David E. Ross via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On 1/21/2018 9:50 AM, Ryan Sleevi wrote:
> > I couldn’t find that listed in the CP/CPS as where to report problems.
> > Instead, I see a different email listed.
> >
> > What made you decide to ignore the CP/CPS, which is where CAs list their
> > problem reporting mechanisms?
> >
> > Given that a CA’s CP/CPS applies to their hierarchy and issuance
> practices,
> > not a single certificate, and given that past discussions on this list
> have
> > specifically called out the CP/CPS as the place to determine problem
> > reporting mechanisms, it does seem unreasonable to expect arbitrary
> > reporting mechanisms to get the same attention as the defined mechanisms.
>
> At the time I tried reporting the problem, I forgot that Google had a
> pending request to add its root to NSS.  When I checked the Certificate
> Manager list of Authorities in my browser, Google did not appear.


I’m not sure I see the relevance of this. Regardless of whether or not a CA
is pending inclusion, there is a defined mechanism for problem reporting,
provided in the CP/CPS. The Mozilla CCADB disclosures lists the applicable
CP/CPS.

Whatever other criticisms you may make, and I would say this regardless the
CA it affected, you used an adhoc reporting mechanism rather than any
defined problem reporting mechanism, and so the failure to respond to that
points less so to the CA’s failure than the reporters.

In any case, this OCSP problem still makes me question Google's ability
> to manage a certification authority.  As a prior reply in this thread
> indicates, it took two days for Google to even acknowledge there is a
> problem.


This framing continues to adopt your misreporting of the date of report (in
order to beget acknowledgement). I agree that a full incident response is
warranted, but I do find it somewhat surprising that the basis of your
conclusion seems to be, from your previous remarks, predicated on a failure
to acknowledge your non-standard, ad-hoc problem report. I can understand
you may “have questions,” but absent details, and in light of your own
misunderstandings, I am curious whether you are being premature in
judgement?


>
> As of right now, it appears the problem has been fixed.  With both
> checkboxes checked under OCSP at [Edit > Preferences > Privacy &
> Security > Certificates], I am now able to reach Google Web sites.
>
> --
> David E. Ross
> 
>
> President Trump:  Please stop using Twitter.  We need
> to hear your voice and see you talking.  We need to know
> when your message is really your own and not your attorney's.
> ___
> dev-security-policy mailing list
> dev-security-policy@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Hanno Böck via dev-security-policy
Hi,

On Sun, 21 Jan 2018 12:09:23 -0800 (PST)
Ryan Hurst via dev-security-policy
 wrote:

> We maintain contact details both within our CPS (like other CAs) and
> at https://pki.goog so that people can reach us expeditiously. In the
> future if anyone needs to reach us please use those details.

I just tried to see what I'd do if I wanted to report issues with
Google's CA (assuming I don't know where its webpage lives and assuming
I don't know any Googlers to report this directly).

When I look into the cert details the certificates for Google webpages
are issued by
"Google Internet Authority G2"

If I goole for that I end up at
https://pki.google.com/

This page has a similar style as the pki.goog, but notably it doesn't
list any contact info. It has an FAQ, but that doesn't have any
question of the form "How do I report a problem with your CA?"

The only thing that might be helpful is a pointer to report security
incidents. I'd probably have done that, though I would be unsure, as
it's debatable whether an offline OCSP counts as a security issue.


Meta-comment:

I think the whole CA incident reporting question has lots of room for
improvement. And I think this should be considered in a way that people
who are not familiar with the details of the CA ecosystem can
successfully report incidents. I.e. saying "you can find all the
contact info in our CPS" is not particularly helpful, as nobody outside
a small circle of people knows what that is.
I think if people try the "natural" way of contacting a certificate
issuing entity this should lead to a successful outcome. (And that is
more or less "This has been issued by X, so I try to contact X".)

-- 
Hanno Böck
https://hboeck.de/

mail/jabber: ha...@hboeck.de
GPG: FE73757FA60E4E21B937579FA5880072BBB51E42
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Hurst via dev-security-policy
> > Is there a known contact to report it (or is someone with a Google hat
> > reading this anyway)?
> 

David,

I am sorry you experienced difficulty in contacting us about this issue. 

We maintain contact details both within our CPS (like other CAs) and at 
https://pki.goog so that people can reach us expeditiously. In the future if 
anyone needs to reach us please use those details.

Google is a large organization and when other teams are contacted (such as DNS) 
we do not have control over when and if those issues will reach us. 

We are actively working on a post mortem on this issue and when it is complete 
we will share it in this thread.

Thanks for your help in this matter,

Ryan Hurst
Product Manager
Google
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Hurst via dev-security-policy

> 
> We are investigating the issue and will provide a update when that 
> investigation is complete.
> 
> Thank you for letting us know.
> 
> Ryan Hurst
> Product Manager
> Google

I wanted to provide an update to the group. The issue has been identified and a 
roll out of the fix is in progress across all geographies.

I have personally verified the fix in several geographies.

A post mortem will be created and shared with the group as soon as it is ready.

Ryan Hurst
Product Manager
Google
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread David E. Ross via dev-security-policy
On 1/21/2018 9:50 AM, Ryan Sleevi wrote:
> I couldn’t find that listed in the CP/CPS as where to report problems.
> Instead, I see a different email listed.
> 
> What made you decide to ignore the CP/CPS, which is where CAs list their
> problem reporting mechanisms?
> 
> Given that a CA’s CP/CPS applies to their hierarchy and issuance practices,
> not a single certificate, and given that past discussions on this list have
> specifically called out the CP/CPS as the place to determine problem
> reporting mechanisms, it does seem unreasonable to expect arbitrary
> reporting mechanisms to get the same attention as the defined mechanisms.

At the time I tried reporting the problem, I forgot that Google had a
pending request to add its root to NSS.  When I checked the Certificate
Manager list of Authorities in my browser, Google did not appear.

In any case, this OCSP problem still makes me question Google's ability
to manage a certification authority.  As a prior reply in this thread
indicates, it took two days for Google to even acknowledge there is a
problem.

As of right now, it appears the problem has been fixed.  With both
checkboxes checked under OCSP at [Edit > Preferences > Privacy &
Security > Certificates], I am now able to reach Google Web sites.

-- 
David E. Ross


President Trump:  Please stop using Twitter.  We need
to hear your voice and see you talking.  We need to know
when your message is really your own and not your attorney's.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Sleevi via dev-security-policy
On Sun, Jan 21, 2018 at 11:12 AM David E. Ross via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On 1/21/2018 7:47 AM, Paul Kehrer wrote:
> > Is there a known contact to report it (or is someone with a Google hat
> > reading this anyway)?
>
> On Friday (two days ago), I reported this to dns-ad...@google.com, the
> only E-mail address in the WhoIs record for google.com.


I couldn’t find that listed in the CP/CPS as where to report problems.
Instead, I see a different email listed.

What made you decide to ignore the CP/CPS, which is where CAs list their
problem reporting mechanisms?

Given that a CA’s CP/CPS applies to their hierarchy and issuance practices,
not a single certificate, and given that past discussions on this list have
specifically called out the CP/CPS as the place to determine problem
reporting mechanisms, it does seem unreasonable to expect arbitrary
reporting mechanisms to get the same attention as the defined mechanisms.


>
> I received an automated reply indicating that security issues should
> instead be reported to secur...@google.com. I immediately resent
> (Thunderbird's Edit As New Message) to secur...@google.com.
>
> I then received an automated reply from secur...@google.com that listed
> a variety of Web addresses for reporting various problems.  I replied
> via E-mail to secur...@google.com:
> > Because of the OCSP failure, I am unable to reach any of the google.com
> > Web site cited in your reply.
>
> Yes, I could disable OCSP checking.  But I my need for Google is
> insufficient for me to browse insecurely.
>
> By the way, in SeaMonkey 2.49.1 (the latest version) the Google Internet
> Authority G2 certificate appears to be an intermediate, signed by the
> GeoTrust Global CA root.
>
> There is a pending request (bug #1325532) from Google to add a Google
> root certificate to NSS.  Given the inadequacy of Google's current
> information on reporting security problems, I have doubts whether this
> request should be approved.
>
> See .
>
> --
> David E. Ross
> 
>
> President Trump:  Please stop using Twitter.  We need
> to hear your voice and see you talking.  We need to know
> when your message is really your own and not your attorney's.
> ___
> dev-security-policy mailing list
> dev-security-policy@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread Ryan Hurst via dev-security-policy
On Sunday, January 21, 2018 at 8:13:30 AM UTC-8, David E. Ross wrote:
> On 1/21/2018 7:47 AM, Paul Kehrer wrote:
> > Is there a known contact to report it (or is someone with a Google hat
> > reading this anyway)?
> 
> On Friday (two days ago), I reported this to dns-ad...@google.com, the
> only E-mail address in the WhoIs record for google.com.
> 
> I received an automated reply indicating that security issues should
> instead be reported to secur...@google.com. I immediately resent
> (Thunderbird's Edit As New Message) to secur...@google.com.
> 
> I then received an automated reply from secur...@google.com that listed
> a variety of Web addresses for reporting various problems.  I replied
> via E-mail to secur...@google.com:
> > Because of the OCSP failure, I am unable to reach any of the google.com
> > Web site cited in your reply.
> 
> Yes, I could disable OCSP checking.  But I my need for Google is
> insufficient for me to browse insecurely.
> 
> By the way, in SeaMonkey 2.49.1 (the latest version) the Google Internet
> Authority G2 certificate appears to be an intermediate, signed by the
> GeoTrust Global CA root.
> 
> There is a pending request (bug #1325532) from Google to add a Google
> root certificate to NSS.  Given the inadequacy of Google's current
> information on reporting security problems, I have doubts whether this
> request should be approved.
> 
> See .
> 
> -- 
> David E. Ross
> 
> 
> President Trump:  Please stop using Twitter.  We need
> to hear your voice and see you talking.  We need to know
> when your message is really your own and not your attorney's.


We are investigating the issue and will provide a update when that 
investigation is complete.

Thank you for letting us know.

Ryan Hurst
Product Manager
Google
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: Google OCSP service down

2018-01-21 Thread David E. Ross via dev-security-policy
On 1/21/2018 7:47 AM, Paul Kehrer wrote:
> Is there a known contact to report it (or is someone with a Google hat
> reading this anyway)?

On Friday (two days ago), I reported this to dns-ad...@google.com, the
only E-mail address in the WhoIs record for google.com.

I received an automated reply indicating that security issues should
instead be reported to secur...@google.com. I immediately resent
(Thunderbird's Edit As New Message) to secur...@google.com.

I then received an automated reply from secur...@google.com that listed
a variety of Web addresses for reporting various problems.  I replied
via E-mail to secur...@google.com:
> Because of the OCSP failure, I am unable to reach any of the google.com
> Web site cited in your reply.

Yes, I could disable OCSP checking.  But I my need for Google is
insufficient for me to browse insecurely.

By the way, in SeaMonkey 2.49.1 (the latest version) the Google Internet
Authority G2 certificate appears to be an intermediate, signed by the
GeoTrust Global CA root.

There is a pending request (bug #1325532) from Google to add a Google
root certificate to NSS.  Given the inadequacy of Google's current
information on reporting security problems, I have doubts whether this
request should be approved.

See .

-- 
David E. Ross


President Trump:  Please stop using Twitter.  We need
to hear your voice and see you talking.  We need to know
when your message is really your own and not your attorney's.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy