Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-16 Thread József Szilágyi via dev-security-policy
Please put also this certificate on that list:
https://crt.sh/?id=181538497=cablint,x509lint

Best Regards, 
Jozsef
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-15 Thread Wayne Thayer via dev-security-policy
On Thu, Mar 15, 2018 at 12:22 PM, Tom via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> Should another bug be opened for the certificate issued by IdenTrust with
> apparently the same encoding problem?
>
> Yes - this is bug 1446121 (
https://bugzilla.mozilla.org/show_bug.cgi?id=1446121)

https://crt.sh/?id=8373036=cablint,x509lint
>

Does Mozilla expects the revocation of such certificates?
>
> Yes, within 24 hours per BR 4.9.1.1 (9) "The CA is made aware that the
Certificate was not issued in accordance with these Requirements or the
CA’s Certificate Policy or Certification Practice Statement;"

Mozilla requires adherence to the BRs, and the BRs require CAs to comply
with RFC 5280.

https://groups.google.com/d/msg/mozilla.dev.security.policy/
> wqySoetqUFM/l46gmX0hAwAJ
>
> - Wayne
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-15 Thread Tom via dev-security-policy

Le 15/03/2018 à 20:04, Wayne Thayer a écrit :

This incident, and the resulting action to "integrate GlobalSign's certlint
and/or zlint into our existing cert-checker pipeline" has been documented
in bug 1446080 [1]

This is further proof that pre-issuance TBS certificate linting (either by
incorporating existing tools or using a comprehensive set of rules) is a
best practice that prevents misissuance. I don't understand why all CA's
aren't doing this.

- Wayne

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1446080



Should another bug be opened for the certificate issued by IdenTrust 
with apparently the same encoding problem?


https://crt.sh/?id=8373036=cablint,x509lint

Does Mozilla expects the revocation of such certificates?

https://groups.google.com/d/msg/mozilla.dev.security.policy/wqySoetqUFM/l46gmX0hAwAJ
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-15 Thread Wayne Thayer via dev-security-policy
This incident, and the resulting action to "integrate GlobalSign's certlint
and/or zlint into our existing cert-checker pipeline" has been documented
in bug 1446080 [1]

This is further proof that pre-issuance TBS certificate linting (either by
incorporating existing tools or using a comprehensive set of rules) is a
best practice that prevents misissuance. I don't understand why all CA's
aren't doing this.

- Wayne

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1446080
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-14 Thread Tom Prince via dev-security-policy
On Tuesday, March 13, 2018 at 4:27:23 PM UTC-6, Matthew Hardeman wrote:
> I thought I recalled a recent case in which a new root/key was declined
> with the sole unresolved (and unresolvable, save for new key generation,
> etc.) matter precluding the inclusion being a prior mis-issuance of test
> certificates, already revoked and disclosed.  Perhaps I am mistaken.

I haven't seen this directly addressed.  I'm not sure what incident you are 
referring to, but I'm fairly that the mis-issuance that needed new keys was for 
certificates that were issued for domains that weren't properly validated.

In the case under discussion in this thread, all the mis-issued certificates 
are only mis-issued due to encoding issues. The certificates are for 
sub-domains of randomly generated subdomains of aws.radiantlock.org (which, 
according to whois, is controlled by Let's Encrypt). I presume these domains 
are created specifically for testing certificate issuance in the production 
environment in a way that complies with the BRs.

To put it succinctly, the issue you are referring to is about issuing 
certificates for domains that aren't authorized (whether for testing or not), 
rather than creating test certificates.

-- Tom Prince
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-14 Thread Ryan Sleevi via dev-security-policy
On Tue, Mar 13, 2018 at 6:27 PM Matthew Hardeman 
wrote:

> Another question this incident raised in my mind pertains to the parallel
>>> staging and production environment paradigm:  If one truly has the
>>> 'courage
>>> of conviction' of the equivalence of the two environments, why would one
>>> not perform all tests in ONLY the staging environment, with no tests and
>>> nothing other than production transactions on the production environment?
>>> That tests continue to be executed in the production environment while
>>> holding to the notion that a fully parallel staging environment is the
>>> place for tests seems to signal that confidence in the staging
>>> environment
>>> is -- in some measure, however small -- limited.
>>
>>
>> That's ... just a bad conclusion, especially for a publicly-trusted CA :)
>>
>>
> I certainly agree it's possible that I've reached a bad conclusion there,
> but I would like to better understand how specifically?  Assuming the same
> input data set and software manipulating said data, two systems should in
> general execute identically.  To the extent that they do not, my initial
> position would be that a significant failing of change management of
> operating environment or data set or system level matters has occurred.  I
> would think all of those would be issues of great concern to a CA, if for
> no other reason than that they should be very very rare.
>

I get the impression you may not have run complex production systems,
especially distributed systems, or spent much time with testing
methodology, given statements such as “courage or your conviction.”

No testing system is going to be perfect, and there’s a difference between
designed redundancy and unnecessary testing.

For example, even if you had 100% code coverage through tests, there are
still things that are possible to get wrong - for example, you could test
every line of your codebase and still fail to properly handle IDNs, for
example - or, as other CAs have shown, ampersands.

It’s foolish to think that a staging environment will cover every possible
permutation - even if you solved the halting problem, you will still have
issues with, say, solar radiation induced bitflips, or RAM heat, or any
number of other issues. And yes, these are issues still affecting real
systems today, not scary stories we tell our SREs to keep them up at night.

Look at any complex system - avionics, military command-and-control,
certificate authorities, modern scalable websites - and you will find
systems designed with redundancy throughout, to ensure proper functioning.
It is the madness of inexperience to suggest that somehow this redundancy
is unnecessary or somehow a black mark - the Sean Hannity approach of “F’
it, we’ll do it live” is the antithesis of modern and secure design. The
suggestion that this is somehow a sign of insufficient testing or design
is, at best, naive, and at worse, detrimental towards discussions of how to
improve the ecosystem.

>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-14 Thread jsha--- via dev-security-policy

> So to clarify I understand this: The same problem was in the staging 
> environment and there where also certificates with illegal encoding issued in 
> staging, but you didn't notice them because no one manually validated them 
> with the crt.sh lint?

That's correct.

> Or are there differences between staging and production?

Yep, there are differences, though of course we try to keep them to a minimum. 
The most notable is that we don't use trusted keys in staging. That means 
staging can only submit to test CT logs, and is therefore not picked up by 
crt.sh.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-14 Thread josef.schneider--- via dev-security-policy
On Tuesday, March 13, 2018 at 23:51:01 UTC+1 js...@letsencrypt.org wrote:

> Clearly we should have caught this earlier in the process. The changes we 
> have in the pipeline (integrating certlint and/or zlint) would have 
> automatically caught the encoding issue at each staging in the pipeline: in 
> development, in staging, and in production.

So to clarify I understand this: The same problem was in the staging 
environment and there where also certificates with illegal encoding issued in 
staging, but you didn't notice them because no one manually validated them with 
the crt.sh lint?

Or are there differences between staging and production?
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread jsha--- via dev-security-policy
On Tuesday, March 13, 2018 at 2:02:45 PM UTC-7, Ryan Sleevi wrote:
> I'm hoping that LE can provide more details about the change management
> process and how, in light of this incident, it may change - both in terms
> of automated testing and in certificate policy review.

Forgot to reply to this specific part. Our change management process starts 
with our SDLC, which mandates code review (typically dual code review), unit 
tests, and where appropriate, integration tests. All unittests and integrations 
tests are run automatically with every change, and before every deploy. Our 
operations team checks the automated test status and will not deploy if the 
tests are broken. Any configuration changes that we plan to apply in staging 
and production are first added to our automated tests.

Each deploy then spends a period of time in our staging environment, where it 
is subject to further automated tests: periodic issuance testing, plus 
performance, availability, and correctness monitoring equivalent to our 
production environment. This includes running the cert-checker software I 
mentioned earlier. Typically our deploys spend two days in our staging 
environment before going live, though that depends on our risk evaluation, and 
hotfix deploys may spend less time in staging if we have high confidence in 
their safety. Similarly, any configuration changes are applied to the staging 
environment before going to production. For significant changes we do 
additional manual testing in the staging environment. Generally this testing 
means checking that the new change was applied as expected, and that no errors 
were produced. We don't rely on manual testing as a primary way of catching 
bugs; we automate everything we can.

If the staging deployment or configuration change doesn't show any problems, we 
continue to production. Production has the same suite of automated live tests 
as staging. And similar to staging, for significant changes we do additional 
manual testing. It was this step that caught the encoding issue, when one of 
our staff used crt.sh's lint tool to double check the test certificate they 
issued.

Clearly we should have caught this earlier in the process. The changes we have 
in the pipeline (integrating certlint and/or zlint) would have automatically 
caught the encoding issue at each staging in the pipeline: in development, in 
staging, and in production.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread Matthew Hardeman via dev-security-policy
On Tue, Mar 13, 2018 at 4:02 PM, Ryan Sleevi  wrote:

>
>
> On Tue, Mar 13, 2018 at 4:13 PM, Matthew Hardeman via dev-security-policy
>  wrote:
>
>> I am not at all suggesting consequences for Let's Encrypt, but rather
>> raising a question as to whether that position on new inclusions /
>> renewals
>> is appropriate.  If these things can happen in a celebrated best-practices
>> environment, can they really in isolation be cause to reject a new
>> application or a new root from an existing CA?
>>
>
> While I certainly appreciate the comparison, I think it's apples and
> oranges when we consider both the nature and degree, nor do I think it's
> fair to suggest "in isolation" is a comparison.
>

I thought I recalled a recent case in which a new root/key was declined
with the sole unresolved (and unresolvable, save for new key generation,
etc.) matter precluding the inclusion being a prior mis-issuance of test
certificates, already revoked and disclosed.  Perhaps I am mistaken.



>
> I'm sure you can agree that incident response is defined by both the
> nature and severity of the incident itself, the surrounding ecosystem
> factors (i.e. was this a well-understood problem), and the detection,
> response, and disclosure practices that follow. A system that does not
> implement any checks whatsoever is, I hope, something we can agree is worse
> than a system that relies on human checks (and virtually indistinguishable
> from no checks), and that both are worse than a system with incomplete
> technical checks.
>
>
I certainly concur with all of that, which is the part of the basis for
which I form my own opinion that Let's Encrypt should not suffer any
consequence of significance beyond advice along the lines of "make your
testing environment and procedures better".


> I do agree with you that I find it challenging with how the staging
> environment was tested - failure to have robust profile tests in staging,
> for example, are what ultimately resulted in Turktrust's notable
> misissuance of unconstrained CA certificates. Similarly, given the wide
> availability of certificate linting tools - such as ZLint, x509Lint,
> (AWS's) certlint, and (GlobalSign's) certlint - there's no dearth of
> availability of open tools and checks. Given the industry push towards
> integration of these automated tools, it's not entirely clear why LE would
> invent yet another, but it's also not reasonable to require that LE use
> something 'off the shelf'.
>

I'm very interested in how the testing occurs in terms of procedures.  I
would assume, for example, that no test transaction of any kind would ever
be "played" against a production environment unless that same exact test
transaction had already been "played" against the staging environment.
With respect to this case, were these wildcard certificates requested and
issued against the staging system with materially the same test transaction
data, and if so was the encoding incorrect?  If these were not performed
against staging, what was the rational basis for executing a new and novel
test transaction against the production system first?  If they were
performed AND if they did not encode incorrectly, then what was the
disparity between the environments which led to this?  (The implication
being that some sort of change management process needs to be revised to
keep the operating environments of staging and production better
synchronized.)  If they were performed and were improperly encoded on the
staging environment, then one would presume that the erroneous result was
missed by the various automated and manual examinations of the results of
the tests.

As you note, it's unreasonable to require use of any particular
implementation of any particular tool but in as far as the other tools
achieve certain results while clearly the LE developed tools did not catch
this issue, it would appear that LE needs to better test their testing
mechanisms and while it may not be necessary for them to incorporate the
competing tools in the live issuance pipeline, it would seem advisable that
Let's Encrypt should pass the output results (the certificates) of tests
within their staging environment through these various other testing tools
as part of a post-staging-deployment testing phase.  It would seem logical
to take the best of breed tools and stack them up whether automatically or
manually and waterfall the final output results of a full suite of test
scenarios against the post-deployment state of the staging environment,
with a view to identifying discrepancies between the LE tool opinion and
the external tool's opinion and reconciling those, rejecting invalid
determinations as appropriate.


>
> I'm hoping that LE can provide more details about the change management
> process and how, in light of this incident, it may change - both in terms
> of automated testing and in certificate policy review.
>
>
>> Another question this incident 

Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread jsha--- via dev-security-policy
On Tuesday, March 13, 2018 at 2:02:45 PM UTC-7, Ryan Sleevi wrote:
> availability of certificate linting tools - such as ZLint, x509Lint,
> (AWS's) certlint, and (GlobalSign's) certlint - there's no dearth of
> availability of open tools and checks. Given the industry push towards
> integration of these automated tools, it's not entirely clear why LE would
> invent yet another, but it's also not reasonable to require that LE use
> something 'off the shelf'.

We are indeed planning to integrate GlobalSign's certlint and/or zlint into our 
existing cert-checker pipeline rather than build something new. We've already 
started submitting issues and PRs, in order to give back to the ecosystem:

https://github.com/zmap/zlint/issues/212
https://github.com/zmap/zlint/issues/211
https://github.com/zmap/zlint/issues/210
https://github.com/globalsign/certlint/pull/5

If your question is why we wrote cert-checker rather than use something 
off-the-shelf: cablint / x509lint weren't available at the time we wrote 
cert-checker. When they became available we evaluated them for production 
and/or CI use, but concluded that the complex dependencies and difficulty of 
productionizing them in our environment outweighed the extra confidence we 
expected to gain, especially given that our certificate profile at the time was 
very static. A system improvement we could have made here would have been to 
set "deploy cablint or its equivalent" as a blocker for future certificate 
profile changes. I'll add that to our list of items for remediation.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread Ryan Sleevi via dev-security-policy
On Tue, Mar 13, 2018 at 4:13 PM, Matthew Hardeman via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> I am not at all suggesting consequences for Let's Encrypt, but rather
> raising a question as to whether that position on new inclusions / renewals
> is appropriate.  If these things can happen in a celebrated best-practices
> environment, can they really in isolation be cause to reject a new
> application or a new root from an existing CA?
>

While I certainly appreciate the comparison, I think it's apples and
oranges when we consider both the nature and degree, nor do I think it's
fair to suggest "in isolation" is a comparison.

I'm sure you can agree that incident response is defined by both the nature
and severity of the incident itself, the surrounding ecosystem factors
(i.e. was this a well-understood problem), and the detection, response, and
disclosure practices that follow. A system that does not implement any
checks whatsoever is, I hope, something we can agree is worse than a system
that relies on human checks (and virtually indistinguishable from no
checks), and that both are worse than a system with incomplete technical
checks.

I do agree with you that I find it challenging with how the staging
environment was tested - failure to have robust profile tests in staging,
for example, are what ultimately resulted in Turktrust's notable
misissuance of unconstrained CA certificates. Similarly, given the wide
availability of certificate linting tools - such as ZLint, x509Lint,
(AWS's) certlint, and (GlobalSign's) certlint - there's no dearth of
availability of open tools and checks. Given the industry push towards
integration of these automated tools, it's not entirely clear why LE would
invent yet another, but it's also not reasonable to require that LE use
something 'off the shelf'.

I'm hoping that LE can provide more details about the change management
process and how, in light of this incident, it may change - both in terms
of automated testing and in certificate policy review.


> Another question this incident raised in my mind pertains to the parallel
> staging and production environment paradigm:  If one truly has the 'courage
> of conviction' of the equivalence of the two environments, why would one
> not perform all tests in ONLY the staging environment, with no tests and
> nothing other than production transactions on the production environment?
> That tests continue to be executed in the production environment while
> holding to the notion that a fully parallel staging environment is the
> place for tests seems to signal that confidence in the staging environment
> is -- in some measure, however small -- limited.


That's ... just a bad conclusion, especially for a publicly-trusted CA :)
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread Matthew Hardeman via dev-security-policy
The fact that this mis-issuance occurred does raise a question for the
community.

For quite some time, it has been repeatedly emphasized that maintaining a
non-trusted but otherwise identical staging environment and practicing all
permutations of tests and issuances -- especially involving new
functionality -- on that parallel staging infrastructure is the mechanism
by which mis-issuances such as those mentioned in this thread may be
avoided within production environments.

Let's Encrypt has been a shining example of best practices up to this point
and has enjoyed the attendant minimization of production issues (presumably
as a result of exercising said best practices).

Despite that, however, either the test cases which resulted in these
mis-issuances were not first executed on the staging platform or did not
result in the mis-issuance there.  A reference was made to a Go lang
library error / non-conformance being implicated.  Were the builds for
staging and production compiled on different releases of Go lang?

Certainly, I think these particular mis-issuances do not significantly
affect the level of trust which should be accorded to ISRG / Let's Encrypt.

Having said that, however, it is worth noting that in a fully new and novel
PKI infrastructure, it seems likely -- based on recent inclusion / renewal
requests -- that such a mis-issuance would recently have resulted in a
disqualification of a given root / key with guidance to cut a new root PKI
and start the process over.

I am not at all suggesting consequences for Let's Encrypt, but rather
raising a question as to whether that position on new inclusions / renewals
is appropriate.  If these things can happen in a celebrated best-practices
environment, can they really in isolation be cause to reject a new
application or a new root from an existing CA?

Another question this incident raised in my mind pertains to the parallel
staging and production environment paradigm:  If one truly has the 'courage
of conviction' of the equivalence of the two environments, why would one
not perform all tests in ONLY the staging environment, with no tests and
nothing other than production transactions on the production environment?
That tests continue to be executed in the production environment while
holding to the notion that a fully parallel staging environment is the
place for tests seems to signal that confidence in the staging environment
is -- in some measure, however small -- limited.


On Tue, Mar 13, 2018 at 8:46 AM, josh--- via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> On Tuesday, March 13, 2018 at 3:33:50 AM UTC-5, Tom wrote:
> > > During final tests for the general availability of wildcard
> > certificate support, the Let's Encrypt operations team issued six test
> > wildcard certificates under our publicly trusted root:
> >  >
> >  > https://crt.sh/?id=353759994
> >  > https://crt.sh/?id=353758875
> >  > https://crt.sh/?id=353757861
> >  > https://crt.sh/?id=353756805
> >  > https://crt.sh/?id=353755984
> >  > https://crt.sh/?id=353754255
> >  >
> > Somebody noticed there
> > https://community.letsencrypt.org/t/acmev2-and-wildcard-
> launch-delay/53654/62
> > that the certificate of *.api.letsencrypt.org (apparently currently in
> > use), issued by "TrustID Server CA A52" (IdenTrust) seams to have the
> > same problem:
> > https://crt.sh/?id=8373036=cablint,x509lint
>
> I think it's just a coincidence that we got a wildcard cert from IdenTrust
> a long time ago and it happens to have the same encoding issue that we ran
> into. I notified IdenTrust in case they haven't fixed the problem since
> then.
> ___
> dev-security-policy mailing list
> dev-security-policy@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-security-policy
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread josh--- via dev-security-policy
On Tuesday, March 13, 2018 at 3:33:50 AM UTC-5, Tom wrote:
> > During final tests for the general availability of wildcard 
> certificate support, the Let's Encrypt operations team issued six test 
> wildcard certificates under our publicly trusted root:
>  >
>  > https://crt.sh/?id=353759994
>  > https://crt.sh/?id=353758875
>  > https://crt.sh/?id=353757861
>  > https://crt.sh/?id=353756805
>  > https://crt.sh/?id=353755984
>  > https://crt.sh/?id=353754255
>  >
> Somebody noticed there 
> https://community.letsencrypt.org/t/acmev2-and-wildcard-launch-delay/53654/62 
> that the certificate of *.api.letsencrypt.org (apparently currently in 
> use), issued by "TrustID Server CA A52" (IdenTrust) seams to have the 
> same problem:
> https://crt.sh/?id=8373036=cablint,x509lint

I think it's just a coincidence that we got a wildcard cert from IdenTrust a 
long time ago and it happens to have the same encoding issue that we ran into. 
I notified IdenTrust in case they haven't fixed the problem since then.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-13 Thread Tom via dev-security-policy
> During final tests for the general availability of wildcard 
certificate support, the Let's Encrypt operations team issued six test 
wildcard certificates under our publicly trusted root:

>
> https://crt.sh/?id=353759994
> https://crt.sh/?id=353758875
> https://crt.sh/?id=353757861
> https://crt.sh/?id=353756805
> https://crt.sh/?id=353755984
> https://crt.sh/?id=353754255
>
Somebody noticed there 
https://community.letsencrypt.org/t/acmev2-and-wildcard-launch-delay/53654/62 
that the certificate of *.api.letsencrypt.org (apparently currently in 
use), issued by "TrustID Server CA A52" (IdenTrust) seams to have the 
same problem:

https://crt.sh/?id=8373036=cablint,x509lint
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-12 Thread Ryan Sleevi via dev-security-policy
On Mon, Mar 12, 2018 at 11:38 PM jacob.hoffmanandrews--- via
dev-security-policy  wrote:

> On Monday, March 12, 2018 at 8:22:46 PM UTC-7, Ryan Sleevi wrote:
> > Given that Let's Encrypt has been operating a Staging Endpoint (
> > https://letsencrypt.org/docs/staging-environment/ ) for issuing
> wildcards,
> > what controls, if any, existed to examine the certificate profiles prior
> to
> > being deployed in production? Normally, that would flush these out -
> > through both manual and automated testing, preferably.
>
> We continuously run our cert-checker tool (
> https://github.com/letsencrypt/boulder/blob/master/cmd/cert-checker/main.go#L196-L261)
> in both staging and production. Unfortunately, it tests mainly the higher
> level semantic aspects of certificates rather than the lower level encoding
> aspects. Clearly we need better coverage on encoding issues. We expect to
> get that from integrating more and better linters into both our CI testing
> framework and our staging and production environments. We will also review
> the existing controls in our cert-checker tool.
>
> > Golang's ASN.1 library is somewhat lax, largely in part to both public
> and
> > enterprise CAs' storied history of misencodings.
>
> Agreed that Go's asn1 package is lax on parsing, but I don't think that it
> aims to be lax on encoding; for instance, the mis-encoding of asterisks in
> PrintableStrings was considered a bug worth fixing.
>
> > What examinations, if any,
> > will Let's Encrypt be doing for other classes of potential encoding
> issues?
> > Has this caused any changes in how Let's Encrypt will construct
> > TBSCertificates, or review of that code, beyond the introduction of
> > additional linting?
>
> We will re-review the code we use to generate TBSCertificates with an eye
> towards encoding issues, thanks for suggesting it. If there are any broad
> classes of encoding issues you think are particularly worth watching out
> for, that could help guide our analysis.


Well, you’ve already run into one of the common ones I’d seen in the past -
more commonly with older OpenSSL-based bespoke/enterprise CAs (due to
long-since fixed defaults, but nobody upgrading)

Encoding of INTEGERS is another frequent source of pain - minimum length
encoding, ensuring positive numbers - but given the Go ASN1 package’s
author’s hatred of that, I would be surprised.

Reordering of SEQUENCES has been an issue for at least two wholly
independent CA software stacks when integrating CT support; at least one I
suspect is due to using a HashMap that has non-guarantees ordering
semantics / iteration order changing between runs and releases. These seems
relevant to Go, given its map designs.

SET encodings not being sorted according to their values when encoding.
This would manifest in DNs, although I don’t believe Boulder supports
equivalent RDNs/AVAs.

Explicit encoding of DEFAULT values, most commonly basicConstraints. This
is issue most commonly crops up when folks convert ASN.1 schemas to
internal templates by hand, rather than using compilers - which is
something applicable to Go.

Not enforcing size constraints - on strings or sequences. Similar to the
above, many folks forget to convert the restrictions when converting by
hand.

Improper encoding of parameter attributes for signature and SPKI algorithms
- especially RSA. This is due to the “ANY DEFINED BY” approach and folks
hand rolling, or not closely reading the specs. This is more high-level,
but derived from the schema flexibility.

Variable encoding of string types between Subject/Issuer or
Issuer/NameConstraints. This is more quasigrayarea - there are defined
semantics for this, but few get it right. This is more high-level, but
derived from the schema flexibility.

Not realizing DNSName, URI, and rfc822Name nameConstraints have different
semantic rules - this is more high-level than encoding, but within that set.

certlint/cablint catches many of these, in a large part through using an
ASN.1 schema compiler (asn1c) rather than hand-rolling. Yet even it has had
some encoding issues in the past.

> Also, is this the correct timestamp? For example, examining
> > https://crt.sh/?id=353754255=ocsp
> > Shows an issuance time of Not Before: Mar 12 22:18:30 2018 GMT and a
> > revocation time of 2018-03-12  23:58:10 UTC , but you stated your
> alerting
> > time was 2018-03-13 00:43:00 UTC. I'm curious if that's a bug in the
> > display of crt.sh, a typo in your timezone computation (considering the
> > recent daylight saving adjustments in the US), a deliberate choice to put
> > revocation somewhere between those dates (which is semantically valid,
> but
> > curious), or perhaps something else.
>
> I believe this was a timezone computation error. By my reading of the
> logs, our alerting time was 2018-03-13 23:43:00 UTC, which agrees with your
> hypothesis about the recent timezone change (DST) leading to a mistake in
> calculating UTC times.
> 

Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-12 Thread jsha--- via dev-security-policy
On Monday, March 12, 2018 at 8:27:06 PM UTC-7, Ryan Sleevi wrote:
> Also, is this the correct timestamp? For example, examining
> https://crt.sh/?id=353754255=ocsp
> 
> Shows an issuance time of Not Before: Mar 12 22:18:30 2018 GMT and a
> revocation time of 2018-03-12  23:58:10 UTC , but you stated your alerting
> time was 2018-03-13 00:43:00 UTC. I'm curious if that's a bug in the
> display of crt.sh, a typo in your timezone computation (considering the
> recent daylight saving adjustments in the US), a deliberate choice to put
> revocation somewhere between those dates (which is semantically valid, but
> curious), or perhaps something else.

Adding a little more detail and precision here: Let's Encrypt backdates 
certificates by one hour, so "Not Before: Mar 12 22:18:30 2018 GMT" indicates 
an issuance time of 23:18:30.

Also, you may notice that one of the certificates was actually revoked at 
23:30:33, before we became aware of the problem. This was done as part of our 
regular deployment testing, to ensure that revocation was working properly.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-12 Thread jacob.hoffmanandrews--- via dev-security-policy
On Monday, March 12, 2018 at 8:22:46 PM UTC-7, Ryan Sleevi wrote:
> Given that Let's Encrypt has been operating a Staging Endpoint (
> https://letsencrypt.org/docs/staging-environment/ ) for issuing wildcards,
> what controls, if any, existed to examine the certificate profiles prior to
> being deployed in production? Normally, that would flush these out -
> through both manual and automated testing, preferably.

We continuously run our cert-checker tool 
(https://github.com/letsencrypt/boulder/blob/master/cmd/cert-checker/main.go#L196-L261)
 in both staging and production. Unfortunately, it tests mainly the higher 
level semantic aspects of certificates rather than the lower level encoding 
aspects. Clearly we need better coverage on encoding issues. We expect to get 
that from integrating more and better linters into both our CI testing 
framework and our staging and production environments. We will also review the 
existing controls in our cert-checker tool.

> Golang's ASN.1 library is somewhat lax, largely in part to both public and
> enterprise CAs' storied history of misencodings.

Agreed that Go's asn1 package is lax on parsing, but I don't think that it aims 
to be lax on encoding; for instance, the mis-encoding of asterisks in 
PrintableStrings was considered a bug worth fixing.

> What examinations, if any,
> will Let's Encrypt be doing for other classes of potential encoding issues?
> Has this caused any changes in how Let's Encrypt will construct
> TBSCertificates, or review of that code, beyond the introduction of
> additional linting?

We will re-review the code we use to generate TBSCertificates with an eye 
towards encoding issues, thanks for suggesting it. If there are any broad 
classes of encoding issues you think are particularly worth watching out for, 
that could help guide our analysis.

> Also, is this the correct timestamp? For example, examining 
> https://crt.sh/?id=353754255=ocsp 
> Shows an issuance time of Not Before: Mar 12 22:18:30 2018 GMT and a 
> revocation time of 2018-03-12  23:58:10 UTC , but you stated your alerting 
> time was 2018-03-13 00:43:00 UTC. I'm curious if that's a bug in the 
> display of crt.sh, a typo in your timezone computation (considering the 
> recent daylight saving adjustments in the US), a deliberate choice to put 
> revocation somewhere between those dates (which is semantically valid, but 
> curious), or perhaps something else.

I believe this was a timezone computation error. By my reading of the logs, our 
alerting time was 2018-03-13 23:43:00 UTC, which agrees with your hypothesis 
about the recent timezone change (DST) leading to a mistake in calculating UTC 
times.
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-12 Thread Ryan Sleevi via dev-security-policy
On Mon, Mar 12, 2018 at 11:22 PM, Ryan Sleevi  wrote:

>
>
> On Mon, Mar 12, 2018 at 10:35 PM, josh--- via dev-security-policy <
> dev-security-policy@lists.mozilla.org> wrote:
>
>> During final tests for the general availability of wildcard certificate
>> support, the Let's Encrypt operations team issued six test wildcard
>> certificates under our publicly trusted root:
>>
>> https://crt.sh/?id=353759994
>> https://crt.sh/?id=353758875
>> https://crt.sh/?id=353757861
>> https://crt.sh/?id=353756805
>> https://crt.sh/?id=353755984
>> https://crt.sh/?id=353754255
>>
>> These certificates contain a subject common name that includes a  “*.”
>> label encoded as an ASN.1 PrintableString, which does not allow the
>> asterisk character, violating RFC 5280.
>>
>> We became aware of the problem on 2018-03-13 at 00:43 UTC via the linter
>> flagging in crt.sh [1].
>
>
Also, is this the correct timestamp? For example, examining
https://crt.sh/?id=353754255=ocsp

Shows an issuance time of Not Before: Mar 12 22:18:30 2018 GMT and a
revocation time of 2018-03-12  23:58:10 UTC , but you stated your alerting
time was 2018-03-13 00:43:00 UTC. I'm curious if that's a bug in the
display of crt.sh, a typo in your timezone computation (considering the
recent daylight saving adjustments in the US), a deliberate choice to put
revocation somewhere between those dates (which is semantically valid, but
curious), or perhaps something else.


> All six certificates have been revoked.
>>
>> The root cause of the problem is a Go language bug [2] which has been
>> resolved in Go v1.10 [3], which we were already planning to deploy soon. We
>> will resolve the issue by upgrading to Go v1.10 before proceeding with our
>> wildcard certificate launch plans.
>>
>> We employ a robust testing infrastructure but there is always room for
>> improvement and sometimes bugs slip through our pre-production tests. We’re
>> fortunate that the PKI community has produced some great testing tools that
>> sometimes catch things we don’t. In response to this incident we are
>> planning to integrate additional tools into our testing infrastructure and
>> improve our test coverage of multiple Go versions.
>>
>> [1] https://crt.sh/
>>
>> [2] https://github.com/golang/go/commit/3b186db7b4a5cc510e71f906
>> 82732eba3df72fd3
>>
>> [3] https://golang.org/doc/go1.10#encoding/asn1
>>
>>
> Given that Let's Encrypt has been operating a Staging Endpoint (
> https://letsencrypt.org/docs/staging-environment/ ) for issuing
> wildcards, what controls, if any, existed to examine the certificate
> profiles prior to being deployed in production? Normally, that would flush
> these out - through both manual and automated testing, preferably.
>
> Given that Let's Encrypt is running on an open-source CA (Boulder), this
> offers a unique opportunity to highlight where the controls and checks are
> in place, particularly for commonNames. RFC 5280 has other restrictions in
> place that have tripped CAs up, such as the exclusively using
> PrintableString/UTF8String for DirectoryString types (except for backwards
> compatibility, which would not apply here), or length restrictions (such as
> 64 characters, per the ASN.1 schema), it would be useful to comprehensively
> review these controls.
>
> Golang's ASN.1 library is somewhat lax, largely in part to both public and
> enterprise CAs' storied history of misencodings. What examinations, if any,
> will Let's Encrypt be doing for other classes of potential encoding issues?
> Has this caused any changes in how Let's Encrypt will construct
> TBSCertificates, or review of that code, beyond the introduction of
> additional linting?
>
>
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


Re: 2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-12 Thread Ryan Sleevi via dev-security-policy
On Mon, Mar 12, 2018 at 10:35 PM, josh--- via dev-security-policy <
dev-security-policy@lists.mozilla.org> wrote:

> During final tests for the general availability of wildcard certificate
> support, the Let's Encrypt operations team issued six test wildcard
> certificates under our publicly trusted root:
>
> https://crt.sh/?id=353759994
> https://crt.sh/?id=353758875
> https://crt.sh/?id=353757861
> https://crt.sh/?id=353756805
> https://crt.sh/?id=353755984
> https://crt.sh/?id=353754255
>
> These certificates contain a subject common name that includes a  “*.”
> label encoded as an ASN.1 PrintableString, which does not allow the
> asterisk character, violating RFC 5280.
>
> We became aware of the problem on 2018-03-13 at 00:43 UTC via the linter
> flagging in crt.sh [1]. All six certificates have been revoked.
>
> The root cause of the problem is a Go language bug [2] which has been
> resolved in Go v1.10 [3], which we were already planning to deploy soon. We
> will resolve the issue by upgrading to Go v1.10 before proceeding with our
> wildcard certificate launch plans.
>
> We employ a robust testing infrastructure but there is always room for
> improvement and sometimes bugs slip through our pre-production tests. We’re
> fortunate that the PKI community has produced some great testing tools that
> sometimes catch things we don’t. In response to this incident we are
> planning to integrate additional tools into our testing infrastructure and
> improve our test coverage of multiple Go versions.
>
> [1] https://crt.sh/
>
> [2] https://github.com/golang/go/commit/3b186db7b4a5cc510e71f90682732e
> ba3df72fd3
>
> [3] https://golang.org/doc/go1.10#encoding/asn1
>
>
Given that Let's Encrypt has been operating a Staging Endpoint (
https://letsencrypt.org/docs/staging-environment/ ) for issuing wildcards,
what controls, if any, existed to examine the certificate profiles prior to
being deployed in production? Normally, that would flush these out -
through both manual and automated testing, preferably.

Given that Let's Encrypt is running on an open-source CA (Boulder), this
offers a unique opportunity to highlight where the controls and checks are
in place, particularly for commonNames. RFC 5280 has other restrictions in
place that have tripped CAs up, such as the exclusively using
PrintableString/UTF8String for DirectoryString types (except for backwards
compatibility, which would not apply here), or length restrictions (such as
64 characters, per the ASN.1 schema), it would be useful to comprehensively
review these controls.

Golang's ASN.1 library is somewhat lax, largely in part to both public and
enterprise CAs' storied history of misencodings. What examinations, if any,
will Let's Encrypt be doing for other classes of potential encoding issues?
Has this caused any changes in how Let's Encrypt will construct
TBSCertificates, or review of that code, beyond the introduction of
additional linting?
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy


2018.03.12 Let's Encrypt Wildcard Certificate Encoding Issue

2018-03-12 Thread josh--- via dev-security-policy
During final tests for the general availability of wildcard certificate 
support, the Let's Encrypt operations team issued six test wildcard 
certificates under our publicly trusted root:

https://crt.sh/?id=353759994
https://crt.sh/?id=353758875
https://crt.sh/?id=353757861
https://crt.sh/?id=353756805
https://crt.sh/?id=353755984
https://crt.sh/?id=353754255

These certificates contain a subject common name that includes a  “*.” label 
encoded as an ASN.1 PrintableString, which does not allow the asterisk 
character, violating RFC 5280.

We became aware of the problem on 2018-03-13 at 00:43 UTC via the linter 
flagging in crt.sh [1]. All six certificates have been revoked.

The root cause of the problem is a Go language bug [2] which has been resolved 
in Go v1.10 [3], which we were already planning to deploy soon. We will resolve 
the issue by upgrading to Go v1.10 before proceeding with our wildcard 
certificate launch plans.

We employ a robust testing infrastructure but there is always room for 
improvement and sometimes bugs slip through our pre-production tests. We’re 
fortunate that the PKI community has produced some great testing tools that 
sometimes catch things we don’t. In response to this incident we are planning 
to integrate additional tools into our testing infrastructure and improve our 
test coverage of multiple Go versions.

[1] https://crt.sh/

[2] https://github.com/golang/go/commit/3b186db7b4a5cc510e71f90682732eba3df72fd3

[3] https://golang.org/doc/go1.10#encoding/asn1
___
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy