Re: OKD 4 - A Modest Proposal

2019-06-23 Thread mabi
Dear Clayton,

Thank you very much for your insight and many details about the upcoming 
version 4.0 of OKD. Based on your mail it sounds nearly too good to be true.

As an ops having played around with OKD 3.10 and 3.11 on CentOS 7 I would like 
to emphasize on the following negative points I have mainly seen with the 
aforementioned version of OKD and hope very much these can be improved in 4.0:

- Changing a single parameter of the cluster requires running again the whole 
ansible deployment which in my case with a small cluster of 3 nodes takes over 
20 minutes. This is frustrating and annoying.

- Upgrading from OKD 3.10 to 3.11 was a big pain as it first failed due to 
version incompatibilities of ansible on CentOS 7 then because of other timeout 
issues which can be workaround with ugly hacks, etc. I think it took me a few 
days or even weeks with the help of the mailing list and github issues to 
finally manage to upgrade successfully. This is IMHO unacceptable from a 
security standpoint. As you mention in your mail upgrades should be painless 
and straightforward.

- Finally there is a LOT of documentation available for OKD which is great but 
in my case with the two main issues I mention above there is no clear 
documentation or guides helping much. At best one can find different upgrade 
scenarios which is quite confusing. For instance I still don't understand or 
found out what is the correct procedure with ansible to keep OKD 3.11 (or 3.10) 
to it's latest patch level, especially in terms of security patches.

This is my standpoint and opinion as an ops guy operating OKD also I must be 
honest I am only playing with OKD since 1 year now so don't have too much 
experience.

But again if I understand correctly and based on your mail below these issues 
should be addressed in OKD 4.0 so I am really looking forward to trying it out 
and will make my life as an ops easier. So thank you again so much for the 
effort.

Best regards,
Mabi

‐‐‐ Original Message ‐‐‐
On Thursday, June 20, 2019 11:19 PM, Clayton Coleman  
wrote:

> TL:DR - I can’t even summarize this, but it’s worth it to read!
>
> First, I’ll start this off with an apology - I intended to draft an OKD 4 
> proposal many months ago, but I kept pushing it back to fix “just one more 
> bug”, and as a result there’s been a real gap in regular summarization across 
> the project.  While I have talked to many community members one-on-one, and 
> many of us interact with each other on GitHub and on Slack and at 
> conferences, I was remiss in highlighting and concentrating the roadmap, 
> design, and iteration proposals for a large chunk of the last 6 months and 
> I’ll do my best to rectify that starting now.
>
> OKD 3.11 has been out since the fall, and is still getting fixes. It should 
> be no surprise to folks on this list that the acquisition of CoreOS last 
> spring triggered a rethink / re-imagining of what OpenShift could / should 
> be.  There was a broad agreement that we’ve all been doing Kubernetes The 
> Hard Way™ (even the cloud providers) since the early days of Kube. Some of 
> these hard things we accepted because Kubernetes was moving so fast.
>
> But Kubernetes is maturing.  The code base is moving from a monorepo to a 
> much larger set of individual services and extensions.  The ecosystem on top 
> of Kubernetes is what is now innovating at a rapid pace. Contributors from 
> both CoreOS and OpenShift asked what a v2 of Tectonic and what a v4 of 
> OpenShift would look like if:
>
> -
>
> we built a platform anchored around Kubernetes
>
> -
>
> that allowed us to rapidly include and support the innovation in the broader 
> ecosystem
>
> -
>
> all the way down to the operating system
>
> -
>
> that informed the evolution of operators (the natural way to extend 
> Kubernetes)
>
> That took longer than anticipated.  Many of those pieces were big bets that 
> we weren’t positive could be well integrated, and if you’ve been following 
> along in the almost a hundred repos that make up OKD you know that some of 
> those pieces reached maturity only in the last month or so. Some of the 
> aspects of Tectonic which weren’t open source weren’t immediately replaced, 
> and, as we evolved the initial CoreOS operating system vision, it wasn’t 
> clear whether it would be Fedora, or RHEL, or something in between.  Much of 
> the change happened in the open, but not all of the planning or debate at the 
> high level.
>
> I’m sorry for that. I will make a concerted effort to summarize what is going 
> on and what to expect more regularly, and also do more to move those 
> discussions into the broader forums rather than stay in specific scopes or 
> specific channels.
>
> ---
>
> So - where are we now (June 2019), and where do we go from here?
>
> The first question is philosophy - what sort of shared goals should we define 
> for OKD?
>
> I personally feel strongly that the CoreOS mission - secure the internet with 
> up-to-date 

OKD 4 - A Modest Proposal

2019-06-20 Thread Clayton Coleman
TL:DR - I can’t even summarize this, but it’s worth it to read!

First, I’ll start this off with an apology - I intended to draft an OKD 4
proposal many months ago, but I kept pushing it back to fix “just one more
bug”, and as a result there’s been a real gap in regular summarization
across the project.  While I have talked to many community members
one-on-one, and many of us interact with each other on GitHub and on Slack
and at conferences, I was remiss in highlighting and concentrating the
roadmap, design, and iteration proposals for a large chunk of the last 6
months and I’ll do my best to rectify that starting now.

OKD 3.11 has been out since the fall, and is still getting fixes. It should
be no surprise to folks on this list that the acquisition of CoreOS last
spring triggered a rethink / re-imagining of what OpenShift could / should
be.  There was a broad agreement that we’ve all been doing Kubernetes The
Hard Way™ (even the cloud providers) since the early days of Kube. Some of
these hard things we accepted because Kubernetes was moving so fast.

But Kubernetes is maturing.  The code base is moving from a monorepo to a
much larger set of individual services and extensions.  The ecosystem on
top of Kubernetes is what is now innovating at a rapid pace. Contributors
from both CoreOS and OpenShift asked what a v2 of Tectonic and what a v4 of
OpenShift would look like if:


   -

   we built a platform anchored around Kubernetes
   -

   that allowed us to rapidly include and support the innovation in the
   broader ecosystem
   -

   all the way down to the operating system
   -

   that informed the evolution of operators (the natural way to extend
   Kubernetes)


That took longer than anticipated.  Many of those pieces were big bets that
we weren’t positive could be well integrated, and if you’ve been following
along in the almost a hundred repos that make up OKD you know that some of
those pieces reached maturity only in the last month or so. Some of the
aspects of Tectonic which weren’t open source weren’t immediately replaced,
and, as we evolved the initial CoreOS operating system vision, it wasn’t
clear whether it would be Fedora, or RHEL, or something in between.  Much
of the change happened in the open, but not all of the planning or debate
at the high level.

I’m sorry for that. I will make a concerted effort to summarize what is
going on and what to expect more regularly, and also do more to move those
discussions into the broader forums rather than stay in specific scopes or
specific channels.

---

So - where are we now (June 2019), and where do we go from here?

The first question is philosophy - what sort of shared goals should we
define for OKD?

I personally feel strongly that the CoreOS mission - secure the internet
with up-to-date open source software - continues to be more relevant with
each passing year. As a contributor to Kubernetes, I know how difficult
keeping an up-to-date version of it can be. As a personal computing user,
I’m horrified at the insecurity of our hardware, our software, and the
services and clouds we use.  And it’s not going to get better unless we
make a concerted effort to fix it.

The OKD mission has always been to create a developer and operations
friendly Kubernetes distribution that isn’t afraid to be opinionated in
order to make running software easier.  That opinionation started with
development tools on top of Kubernetes and security underneath. But I think
we should now take the next step - strengthen our opinions on how we ship
and update (continuously!) and how we run the platform itself.  Not just to
deliver new features, but to deliver fixes and plug security holes.

So I would propose our goal for OKD 4 to be:

***

The perfect Kubernetes distribution for those who want to continuously be
on the latest Kubernetes and ecosystem components. It should combine an
up-to-date OS, the Kubernetes control plane, and a large number of
ecosystem operators to provide an easy-to-extend distribution of Kubernetes
that is always on the latest released version of ecosystem tools.

***

Does that resonate with others?  What other goals do people believe in and
are willing to support with their time and effort?

---

The second step (if we agree with the philosophy) is to articulate the
choices that would describe Kubernetes The Probably Better Than Before Way:


*** First, Kubernetes is the best platform for running distributed apps,
and since we believe that, we should use it to run Kubernetes itself.

You should never have to:


   -

   Restart your control plane by SSHing to a machine
   -

   Remember which components are running as a system service vs as a pod
   -

   Orchestrate pods + node services.
   -

   Take downtime during a control plane reconfiguration


This ensures that the platform benefits from things we add (resiliency,
debuggability, observability), and makes the platform (which HAS to run
successfully 100% of the time) the best possible