Re: OKD 4 - A Modest Proposal
Dear Clayton, Thank you very much for your insight and many details about the upcoming version 4.0 of OKD. Based on your mail it sounds nearly too good to be true. As an ops having played around with OKD 3.10 and 3.11 on CentOS 7 I would like to emphasize on the following negative points I have mainly seen with the aforementioned version of OKD and hope very much these can be improved in 4.0: - Changing a single parameter of the cluster requires running again the whole ansible deployment which in my case with a small cluster of 3 nodes takes over 20 minutes. This is frustrating and annoying. - Upgrading from OKD 3.10 to 3.11 was a big pain as it first failed due to version incompatibilities of ansible on CentOS 7 then because of other timeout issues which can be workaround with ugly hacks, etc. I think it took me a few days or even weeks with the help of the mailing list and github issues to finally manage to upgrade successfully. This is IMHO unacceptable from a security standpoint. As you mention in your mail upgrades should be painless and straightforward. - Finally there is a LOT of documentation available for OKD which is great but in my case with the two main issues I mention above there is no clear documentation or guides helping much. At best one can find different upgrade scenarios which is quite confusing. For instance I still don't understand or found out what is the correct procedure with ansible to keep OKD 3.11 (or 3.10) to it's latest patch level, especially in terms of security patches. This is my standpoint and opinion as an ops guy operating OKD also I must be honest I am only playing with OKD since 1 year now so don't have too much experience. But again if I understand correctly and based on your mail below these issues should be addressed in OKD 4.0 so I am really looking forward to trying it out and will make my life as an ops easier. So thank you again so much for the effort. Best regards, Mabi ‐‐‐ Original Message ‐‐‐ On Thursday, June 20, 2019 11:19 PM, Clayton Coleman wrote: > TL:DR - I can’t even summarize this, but it’s worth it to read! > > First, I’ll start this off with an apology - I intended to draft an OKD 4 > proposal many months ago, but I kept pushing it back to fix “just one more > bug”, and as a result there’s been a real gap in regular summarization across > the project. While I have talked to many community members one-on-one, and > many of us interact with each other on GitHub and on Slack and at > conferences, I was remiss in highlighting and concentrating the roadmap, > design, and iteration proposals for a large chunk of the last 6 months and > I’ll do my best to rectify that starting now. > > OKD 3.11 has been out since the fall, and is still getting fixes. It should > be no surprise to folks on this list that the acquisition of CoreOS last > spring triggered a rethink / re-imagining of what OpenShift could / should > be. There was a broad agreement that we’ve all been doing Kubernetes The > Hard Way™ (even the cloud providers) since the early days of Kube. Some of > these hard things we accepted because Kubernetes was moving so fast. > > But Kubernetes is maturing. The code base is moving from a monorepo to a > much larger set of individual services and extensions. The ecosystem on top > of Kubernetes is what is now innovating at a rapid pace. Contributors from > both CoreOS and OpenShift asked what a v2 of Tectonic and what a v4 of > OpenShift would look like if: > > - > > we built a platform anchored around Kubernetes > > - > > that allowed us to rapidly include and support the innovation in the broader > ecosystem > > - > > all the way down to the operating system > > - > > that informed the evolution of operators (the natural way to extend > Kubernetes) > > That took longer than anticipated. Many of those pieces were big bets that > we weren’t positive could be well integrated, and if you’ve been following > along in the almost a hundred repos that make up OKD you know that some of > those pieces reached maturity only in the last month or so. Some of the > aspects of Tectonic which weren’t open source weren’t immediately replaced, > and, as we evolved the initial CoreOS operating system vision, it wasn’t > clear whether it would be Fedora, or RHEL, or something in between. Much of > the change happened in the open, but not all of the planning or debate at the > high level. > > I’m sorry for that. I will make a concerted effort to summarize what is going > on and what to expect more regularly, and also do more to move those > discussions into the broader forums rather than stay in specific scopes or > specific channels. > > --- > > So - where are we now (June 2019), and where do we go from here? > > The first question is philosophy - what sort of shared goals should we define > for OKD? > > I personally feel strongly that the CoreOS mission - secure the internet with > up-to-date
OKD 4 - A Modest Proposal
TL:DR - I can’t even summarize this, but it’s worth it to read! First, I’ll start this off with an apology - I intended to draft an OKD 4 proposal many months ago, but I kept pushing it back to fix “just one more bug”, and as a result there’s been a real gap in regular summarization across the project. While I have talked to many community members one-on-one, and many of us interact with each other on GitHub and on Slack and at conferences, I was remiss in highlighting and concentrating the roadmap, design, and iteration proposals for a large chunk of the last 6 months and I’ll do my best to rectify that starting now. OKD 3.11 has been out since the fall, and is still getting fixes. It should be no surprise to folks on this list that the acquisition of CoreOS last spring triggered a rethink / re-imagining of what OpenShift could / should be. There was a broad agreement that we’ve all been doing Kubernetes The Hard Way™ (even the cloud providers) since the early days of Kube. Some of these hard things we accepted because Kubernetes was moving so fast. But Kubernetes is maturing. The code base is moving from a monorepo to a much larger set of individual services and extensions. The ecosystem on top of Kubernetes is what is now innovating at a rapid pace. Contributors from both CoreOS and OpenShift asked what a v2 of Tectonic and what a v4 of OpenShift would look like if: - we built a platform anchored around Kubernetes - that allowed us to rapidly include and support the innovation in the broader ecosystem - all the way down to the operating system - that informed the evolution of operators (the natural way to extend Kubernetes) That took longer than anticipated. Many of those pieces were big bets that we weren’t positive could be well integrated, and if you’ve been following along in the almost a hundred repos that make up OKD you know that some of those pieces reached maturity only in the last month or so. Some of the aspects of Tectonic which weren’t open source weren’t immediately replaced, and, as we evolved the initial CoreOS operating system vision, it wasn’t clear whether it would be Fedora, or RHEL, or something in between. Much of the change happened in the open, but not all of the planning or debate at the high level. I’m sorry for that. I will make a concerted effort to summarize what is going on and what to expect more regularly, and also do more to move those discussions into the broader forums rather than stay in specific scopes or specific channels. --- So - where are we now (June 2019), and where do we go from here? The first question is philosophy - what sort of shared goals should we define for OKD? I personally feel strongly that the CoreOS mission - secure the internet with up-to-date open source software - continues to be more relevant with each passing year. As a contributor to Kubernetes, I know how difficult keeping an up-to-date version of it can be. As a personal computing user, I’m horrified at the insecurity of our hardware, our software, and the services and clouds we use. And it’s not going to get better unless we make a concerted effort to fix it. The OKD mission has always been to create a developer and operations friendly Kubernetes distribution that isn’t afraid to be opinionated in order to make running software easier. That opinionation started with development tools on top of Kubernetes and security underneath. But I think we should now take the next step - strengthen our opinions on how we ship and update (continuously!) and how we run the platform itself. Not just to deliver new features, but to deliver fixes and plug security holes. So I would propose our goal for OKD 4 to be: *** The perfect Kubernetes distribution for those who want to continuously be on the latest Kubernetes and ecosystem components. It should combine an up-to-date OS, the Kubernetes control plane, and a large number of ecosystem operators to provide an easy-to-extend distribution of Kubernetes that is always on the latest released version of ecosystem tools. *** Does that resonate with others? What other goals do people believe in and are willing to support with their time and effort? --- The second step (if we agree with the philosophy) is to articulate the choices that would describe Kubernetes The Probably Better Than Before Way: *** First, Kubernetes is the best platform for running distributed apps, and since we believe that, we should use it to run Kubernetes itself. You should never have to: - Restart your control plane by SSHing to a machine - Remember which components are running as a system service vs as a pod - Orchestrate pods + node services. - Take downtime during a control plane reconfiguration This ensures that the platform benefits from things we add (resiliency, debuggability, observability), and makes the platform (which HAS to run successfully 100% of the time) the best possible