Advice installing new custom master API certificates
I have a cluster running with let's encrypt/certbot generated certificates. The corresponding “fullchain.pem" and “privkey.pem" files are in /home/centos and the corresponding section of my inventory.yaml looks like this: - openshift_master_overwrite_named_certificates: true openshift_master_named_certificates: [ { 'certfile': "/home/centos/fullchain.pem", 'keyfile': "/home/centos/privkey.pem" } ] That’s all working fine. Now I have the following set of custom certificate files: - - cert.crt - ca-bundle.crt - private.key What do I need to do to replace the existing set of let’s encrypt certificates with these new custom files? I’m struggling with the 3.11 documentation on the matter (https://docs.openshift.com/container-platform/3.11/install_config/certificate_customization.html <https://docs.openshift.com/container-platform/3.11/install_config/certificate_customization.html>). I think (in the "Retrofit Custom Master Certificates into a Cluster” section) it is telling me to adjust my inventory to look like this: - openshift_master_overwrite_named_certificates: true openshift_master_named_certificates: [ { 'certfile': "/home/centos/cert.crt”, ‘cafile': "/home/centos/ca-bundle.crt", 'keyfile': "/home/centos/private.key", 'names': ["okd.xchem.diamond.ac.uk"] } ] And then, run the “redeploy-certificates.yml” playbook. But it then goes on to talk about adjusting the master-config.yaml (step 4) but doesn’t go into any specifics about what actually needs to be done. Is this editing not part of the playbook tasks referred to above (in step 3 of the documentation)? The guide also talks about concatenating the certificate file. Do I need to concatenate the “cert" and "ca-bandle" files? If so do I need to specify the ‘cafile' in the inventory? As a short-cut could I just go to the /etc/origin/master/named_certificates directory, replace the files and then bounce the API and CONTROLLERS processes? It all gets a bit foggy. Can someone explain the essential steps for me please? Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: OKD 3.11 - Volume and Claim Pre-binding - volumes for a namespace
Thanks, I was wondering whether I could create an arbitrary storage class so (if the application can be adjusted to name that class) this might well be a solution. I’ll poke around today, thanks. Alan Christie achris...@informaticsmatters.com > On 18 Nov 2019, at 12:08 pm, Frederic Giloux wrote: > > Hi Alan > > you can use a storage class for the purpose [1] and pair it with quotas for > the defined storage class [2] as proposed by Samuel. > > [1] > https://docs.okd.io/3.11/install_config/storage_examples/storage_classes_legacy.html#install-config-storage-examples-storage-classes-legacy > > <https://docs.okd.io/3.11/install_config/storage_examples/storage_classes_legacy.html#install-config-storage-examples-storage-classes-legacy> > [2] > https://docs.okd.io/3.11/dev_guide/compute_resources.html#dev-managed-by-quota > > <https://docs.okd.io/3.11/dev_guide/compute_resources.html#dev-managed-by-quota> > > Regards, > > Frédéric > > On Mon, Nov 18, 2019 at 12:38 PM Samuel Martín Moro <mailto:faus...@gmail.com>> wrote: > Not that I know of. > The claimRef is not meant to be changed manually. Once set, PV should have > been bound already, you won't be able to only set a namespace. > > Have you considered using ResourceQuotas? > > To deny users in a Project from requesting persistent storage, you could use > the following: > > apiVersion: v1 > kind: ResourceQuota > metadata: > name: no-pv > namespace: project-with-no-persistent-volumes > spec: > hard: > persistentvolumeclaims: 0 > > > On Mon, Nov 18, 2019 at 12:00 PM Alan Christie > mailto:achris...@informaticsmatters.com>> > wrote: > On the topic of volume claim pre-binding … > > Is there a pattern for creating volumes that can only be bound to a PVC from > a known namespace, specifically when the PVC name may not be known in advance? > > In my specific case I don’t have control over the application's PVC name but > I do know its namespace. I need to prevent the pre-allocated volume from > being bound to a claim from a namespace other than the one the application’s > in. > > The `PersistentVolume` spec contains a `claimRef` section but I suspect that > you can’t just fill-out the `namespace`, you need to provide both the `name` > and `namespace` (because although the former doesn’t generate an error it > doesn’t work). > > Any suggestions? > > Alan Christie > achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com> > > > > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > > -- > Samuel Martín Moro > {EPITECH.} 2011 > > "Nobody wants to say how this works. > Maybe nobody knows ..." > Xorg.conf(5) > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > > -- > Frédéric Giloux > Senior Technical Account Manager > Red Hat Germany > > fgil...@redhat.com <mailto:fgil...@redhat.com> M: +49-174-172-4661 > > > redhat.com <http://edhat.com/> | TRIED. TESTED. TRUSTED. | redhat.com/trusted > <http://redhat.com/trusted> > > Red Hat GmbH, http://www.de.redhat.com/ <http://www.de.redhat.com/>, Sitz: > Grasbrunn, > Handelsregister: Amtsgericht München, HRB 153243, > Geschäftsführer: Charles Cachera, Michael O'Neill, Tom Savage, Eric Shander ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
OKD 3.11 - Volume and Claim Pre-binding - volumes for a namespace
On the topic of volume claim pre-binding … Is there a pattern for creating volumes that can only be bound to a PVC from a known namespace, specifically when the PVC name may not be known in advance? In my specific case I don’t have control over the application's PVC name but I do know its namespace. I need to prevent the pre-allocated volume from being bound to a claim from a namespace other than the one the application’s in. The `PersistentVolume` spec contains a `claimRef` section but I suspect that you can’t just fill-out the `namespace`, you need to provide both the `name` and `namespace` (because although the former doesn’t generate an error it doesn’t work). Any suggestions? Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Prometheus (OKD/3.11) NFS?
Thanks, it turned out to be relatively simple in that I just needed 2 PVs (prometheus-1 and 2) to satisfy prometheus and 3 (alertmanager-1 to 3) to satisfy the alertmanager. It wasn’t immediately obvious how to solve this in a ‘static’ system it is documented and now working - I just set the two `storage_enabled` variables but leave the `storageslass` ones alone. I understand the side-effects of NFS but my options (at the moment) are extremely limited. Thanks for your help. Alan Christie achris...@informaticsmatters.com > On 12 Nov 2019, at 1:28 pm, Simon Pasquier wrote: > > On Wed, Nov 6, 2019 at 6:11 PM Alan Christie > wrote: >> >> Hi, >> >> My cluster doesn’t have dynamic volumes at the moment. For experimentation >> is it possible to use NFS volumes in 3.11 for prometheus and the alert >> manager? > > In general Prometheus doesn't play well with NFS because most > implementations don't fully support POSIX. > It is called out in the Prometheus documentation: > https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects > >> >> I notice that the ansible playbook variables for prometheus in OKD 3.11 are >> very different to those in 3.7 but... > > Yes the big difference starting with 3.11 is that Prometheus is > deployed using the Prometheus operator via the Cluster Monitoring > Operator. > >> >> - Can I use pre-provisioned NFS volumes? >> - …or…is there an equivalent of 3.7’s >> “openshift_prometheus_alertmanager_storage_kind=nfs"? > > See > https://docs.okd.io/3.11/install_config/prometheus_cluster_monitoring.html#persistent-storage > > >> >> Any advice would be greatly appreciated. >> >> Alan Christie >> achris...@informaticsmatters.com >> >> >> >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Prometheus (OKD/3.11) NFS?
Hi, My cluster doesn’t have dynamic volumes at the moment. For experimentation is it possible to use NFS volumes in 3.11 for prometheus and the alert manager? I notice that the ansible playbook variables for prometheus in OKD 3.11 are very different to those in 3.7 but... - Can I use pre-provisioned NFS volumes? - …or…is there an equivalent of 3.7’s “openshift_prometheus_alertmanager_storage_kind=nfs"? Any advice would be greatly appreciated. Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: OpenShift 3.7 ansible installer - getting "Error downloading packages"
This may be related to DNS. An installation of 3.9 is perfectly fine and its nodes are able to resolve DNS just fine. When I install 3.7 the nodes reporting this error are unable to resolve the package hosts (or any host). The majority of nodes just fail on DNS with errors like "Could not resolve host: mirror.nsc.liu.se; Unknown error” even though resolv.conf looks sensible, in fact the same as 3.9’s hosts. Anyone else seeing this in an install of 3.7? Alan Christie achris...@informaticsmatters.com > On 31 Jan 2019, at 3:32 pm, Alan Christie > wrote: > > I’m trying to re-create an old 3.7 cluster from scratch using the ansible > installer and an existing inventory (that used to work) but now I encounter > the following… > > Error downloading packages: > libnetfilter_cttimeout-1.0.0-6.el7.x86_64: [Errno 256] No > more mirrors to try. > libnetfilter_queue-1.0.2-2.el7_2.x86_64: [Errno 256] No more > mirrors to try. > libnetfilter_cthelper-1.0.0-9.el7.x86_64: [Errno 256] No more > mirrors to try. > origin-node-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] No > more mirrors to try. > socat-1.7.3.2-2.el7.x86_64: [Errno 256] No more mirrors to > try. > tuned-profiles-origin-node-3.7.2-1.el7.git.0.cd74924.x86_64: > [Errno 256] No more mirrors to try. > origin-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] No more > mirrors to try. > conntrack-tools-1.4.4-4.el7.x86_64: [Errno 256] No more > mirrors to try. > origin-clients-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] > No more mirrors to try. > > Does someone know what I need to do to my inventory to get the installer to > pickup these packages? > > Alan Christie > achris...@informaticsmatters.com > > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
OpenShift 3.7 ansible installer - getting "Error downloading packages"
I’m trying to re-create an old 3.7 cluster from scratch using the ansible installer and an existing inventory (that used to work) but now I encounter the following… Error downloading packages: libnetfilter_cttimeout-1.0.0-6.el7.x86_64: [Errno 256] No more mirrors to try. libnetfilter_queue-1.0.2-2.el7_2.x86_64: [Errno 256] No more mirrors to try. libnetfilter_cthelper-1.0.0-9.el7.x86_64: [Errno 256] No more mirrors to try. origin-node-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] No more mirrors to try. socat-1.7.3.2-2.el7.x86_64: [Errno 256] No more mirrors to try. tuned-profiles-origin-node-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] No more mirrors to try. origin-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] No more mirrors to try. conntrack-tools-1.4.4-4.el7.x86_64: [Errno 256] No more mirrors to try. origin-clients-3.7.2-1.el7.git.0.cd74924.x86_64: [Errno 256] No more mirrors to try. Does someone know what I need to do to my inventory to get the installer to pickup these packages? Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
OKD 3.11 : TASK [get openshift_current_version] : ERROR! Unexpected Exception
Hi, I’m trying to install OKD 3.11 using the Atomic System Container approach. It all looks rather simple and I’m running ing the container with (I think) a valid inventory on a bastion machine in AWS but can’t get past the following error: TASK [get openshift_current_version] *** ERROR! Unexpected Exception, this is probably a bug: update expected at most 1 arguments, got 2 to see the full traceback, use -vvv It’s a fresh installation so there is (obviously) no ‘current version’. Has anyone seen this before and, if so, is there a workaround? Ideally a playbook fix would be appreciated, at least to understand what it is about my inventory or cluster that leads to this error. I’ve taken a quick look at the playbook but I think it’s something for the author. I raised an issue on the openshift-ansible repository (https://github.com/openshift/openshift-ansible/issues/10626 <https://github.com/openshift/openshift-ansible/issues/10626>) a few weeks ago but that’s got no response. Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Jenkins slave agent MountVolume failure: "Failed to start transient scope unit: Argument list too long"
OpenShift Master: v3.9.0+ba7faec-1 Kubernetes Master: v1.9.1+a0ce1bc657 OpenShift Web Console: v3.9.0+b600d46-dirty After working successfully for the past few months, my Jenkins deployment started to fail to launch build agents for jobs. The event error was essentially Failed to start transient scope unit: Argument list too long. The error was initially confusing because it’s just running the same agents it’s always been running. The agents are configured to live for a short time (15 minutes) after which they’re removed and another created when necessary. All this has been perfectly functional up until today. The complete event error was: - MountVolume.SetUp failed for volume "fs-input" : mount failed: exit status 1 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/origin/openshift.local.volumes/pods/4da0f883-aaa2-11e8-901a-c81f66c79dfc/volumes/kubernetes.io~nfs/fs-input --scope -- mount -t nfs -o ro bastion.novalocal:/data/fs-input /var/lib/origin/openshift.local.volumes/pods/4da0f883-aaa2-11e8-901a-c81f66c79dfc/volumes/kubernetes.io~nfs/fs-input Output: Failed to start transient scope unit: Argument list too long I suspect it might be related to Kubernetes issue #57345 <https://github.com/kubernetes/kubernetes/issues/57345> : Number of "loaded inactive dead" systemd transient mount units continues to grow. In attempt to rectify the situation I tried the issue's suggestion, which was to run: - $ sudo systemctl daemon-reload ...on the affected node(s). It worked on all nodes except the one that was giving me problems. On the “broken” node the command took a few seconds to complete but failed, responding with: - Failed to execute operation: Connection timed out I was unable to reboot the node from the command-line (clearly the system was polluted to the point that it was essentially unusable) and I was forced to resort to rebooting the node by other means. When the node returned Jenkins and it’s deployments eventually returned to an operational state. So it looks like the issue may be right: - the number of systemd transient mount units continues to grow unchecked on nodes. Although I’ve recovered the system and now believe I have a work-around for the underlying fault next time I see this I wonder whether anyone else seen this in 3.9 and is there a long-term solution for this? Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Ansible/Origin 3.9 deployment now fails because "package(s) are available at a version that is higher than requested"
OK, moved to `#9675` https://github.com/openshift/openshift-ansible/issues/9675 <https://github.com/openshift/openshift-ansible/issues/9675> Thanks for your attention. As I have said, I have a work-around for now, and that’s to disable package checks for my OpenStack and AWS deployments, which allow me to avoid the error and orchestrate operational 3.9 clusters. Alan Christie achris...@informaticsmatters.com > On 20 Aug 2018, at 16:08, Daniel Comnea wrote: > > Just came across this email, and still not clear why the issue is still > taking place. > > Can you please move this issue onto > https://github.com/openshift/openshift-ansible > <https://github.com/openshift/openshift-ansible> and provide following info: > > openshift-ansible rpm (if you used that) or the tag used > the gits output with the full trace error you get > > I'll try and see if can help you out.. > > > > On Mon, Aug 20, 2018 at 3:20 PM, Peter Heitman <mailto:pe...@heitman.us>> wrote: > I agree with you. I've hit this same error when previous versions were > released. I'm not sure why defining the version we want to install (and then > using that version of the openshift ansible git) isn't sufficient. As for > installing the repo, I do this before I run the prerequisite playbook, i.e. > ansible all -i -m yum -a "name=centos-release-openshift-origin39 > state=present" --become. That seems to resolve the issue. > > On Mon, Aug 20, 2018 at 10:10 AM Alan Christie > mailto:achris...@informaticsmatters.com>> > wrote: > Thanks Peter. > > Interestingly it looks like it’s Origin’s own “prerequisites.yml” playbook > that’s adding the repo that’s causing problems. My instances don’t have this > repo until I run that playbook. > > Why do I have to remove something that’s being added by the prerequisite > playbook? Especially as my inventory explicitly states > "openshift_release=v3.9”? > > If the answer is “do not run prerequisites.yml” what’s the point of it? > > I still wonder why this specific issue is actually an error? Shouldn’t it be > installing specific version anyway? Shouldn’t it be error occur if there is > no 3.9 package, not if there’s a 3.10 package? > > Incidentally, I’m using the ansible code from "openshift-ansible-3.9.40-1”. > > Alan Christie > achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com> > > > >> On 18 Aug 2018, at 13:36, Peter Heitman > <mailto:pe...@heitman.us>> wrote: >> >> See the recent thread "How to avoid upgrading to 3.10". The bottom line is >> to install the 3.9 specific repo. For CentOS that is >> centos-release-openshift-origin39 >> >> On Sat, Aug 18, 2018, 2:44 AM Alan Christie >> mailto:achris...@informaticsmatters.com>> >> wrote: >> HI, >> >> I’ve been deploying new clusters of Origin v3.9 using the official Ansible >> playbook approach for a few weeks now, using what appear to be perfectly >> reasonable base images on OpenStack and AWS. Then, this week, with no other >> changes having been made, the deployment fails with this message: - >> >> One or more checks failed >> check "package_version": >>Some required package(s) are available at a version >>that is higher than requested >> origin-3.10.0 >> origin-node-3.10.0 >> origin-master-3.10.0 >>This will prevent installing the version you requested. >>Please check your enabled repositories or adjust >> openshift_release. >> >> I can avoid the error, and deploy what appears to be a perfectly functional >> 3.9, if I add package_version to openshift_disable_check in the inventory >> the deployment. But this is not the right way to deal with this sort of >> error. >> >> Q1) How does one correctly address this error? >> >> Q2) Out of interest … why is this specific issue an error? I’ve instructed >> the playbook to instal v3.9. I don't care if there is a 3.10 release >> available - I do care if there is not a 3.9. Shouldn’t the error occur if >> there is no 3.9 package, not if there’s a 3.10 package? >> >> Alan Christie >> Informatics Matters Ltd. >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: origin3.9 deployments
Thanks, I’ve tried all sorts of things now and need a rest - I’ve been trying to understand this behaviour for the lat 7 hours and the working day’s approaching its end for me! In the meantime I’m raising this an an issue as requested as I shouldn’t need to tinker with repos that are being installed by the OpenShift playbooks. I use tagged releases and am using "openshift-ansible-3.9.40-1”. The rest of the details will go in the issue. In the meantime I’m just going to set “package_version” in the "openshift_disable_check” list in the inventory. Alan Christie achris...@informaticsmatters.com > On 20 Aug 2018, at 16:33, Walters, Todd wrote: > > I believe have the proper repos enabled is part of the node prerequisites. > So we get around this by running prereq playbook and disabling the origin > release for latest (repo with no release number on end) and enabling 3.9 only > > - name: Install Specific Version of Openshift Origin > yum: >name: centos-release-openshift-origin >state: absent > > - name: Install Specific Version of Openshift Origin > yum: >name: centos-release-openshift-origin39 >state: present > > Also, only git branch that’s supposed to work is release-3.9, which is what > we always pull for playbooks. > > Thanks, > > Todd > > Today's Topics: > > 1. Re: Ansible/Origin 3.9 deployment now fails because > "package(s) areavailable at a version that is higher than > requested" (Alan Christie) > 2. Re: Ansible/Origin 3.9 deployment now fails because > "package(s) areavailable at a version that is higher than > requested" (Alan Christie) > > >-- > >Message: 1 >Date: Mon, 20 Aug 2018 16:11:50 +0100 >From: Alan Christie >To: Peter Heitman >Cc: users >Subject: Re: Ansible/Origin 3.9 deployment now fails because >"package(s) areavailable at a version that is higher than requested" >Message-ID: > >Content-Type: text/plain; charset="utf-8" > >I?m doing pretty-much the same thing. Prior to ?prerequisites? I run the > following play: > >- hosts: nodes > become: yes > > tasks: > > - name: Install origin39 repo >yum: > name: centos-release-openshift-origin39 > state: present > >The 3.9 repo appears in /etc/yum.repos.d/ but, after the prerequisites, so > does "CentOS-OpenShift-Origin.repo? and the main ?deploy_cluster.yml? fails > again. The only way through this for me to add ?package_version? to > ?openshift_disable_check?. > >Alan Christie >achris...@informaticsmatters.com > > > > > > The information contained in this message, and any attachments thereto, > is intended solely for the use of the addressee(s) and may contain > confidential and/or privileged material. Any review, retransmission, > dissemination, copying, or other use of the transmitted information is > prohibited. If you received this in error, please contact the sender > and delete the material from any computer. UNIGROUP.COM > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Ansible/Origin 3.9 deployment now fails because "package(s) are available at a version that is higher than requested"
But first, quest question… Is it essential to define all of these? openshift_release openshift_image_tag openshift_pkg_version I have just noticed that only the first is defined. Alan Christie achris...@informaticsmatters.com > On 20 Aug 2018, at 16:08, Daniel Comnea wrote: > > Just came across this email, and still not clear why the issue is still > taking place. > > Can you please move this issue onto > https://github.com/openshift/openshift-ansible > <https://github.com/openshift/openshift-ansible> and provide following info: > > openshift-ansible rpm (if you used that) or the tag used > the gits output with the full trace error you get > > I'll try and see if can help you out.. > > > > On Mon, Aug 20, 2018 at 3:20 PM, Peter Heitman <mailto:pe...@heitman.us>> wrote: > I agree with you. I've hit this same error when previous versions were > released. I'm not sure why defining the version we want to install (and then > using that version of the openshift ansible git) isn't sufficient. As for > installing the repo, I do this before I run the prerequisite playbook, i.e. > ansible all -i -m yum -a "name=centos-release-openshift-origin39 > state=present" --become. That seems to resolve the issue. > > On Mon, Aug 20, 2018 at 10:10 AM Alan Christie > mailto:achris...@informaticsmatters.com>> > wrote: > Thanks Peter. > > Interestingly it looks like it’s Origin’s own “prerequisites.yml” playbook > that’s adding the repo that’s causing problems. My instances don’t have this > repo until I run that playbook. > > Why do I have to remove something that’s being added by the prerequisite > playbook? Especially as my inventory explicitly states > "openshift_release=v3.9”? > > If the answer is “do not run prerequisites.yml” what’s the point of it? > > I still wonder why this specific issue is actually an error? Shouldn’t it be > installing specific version anyway? Shouldn’t it be error occur if there is > no 3.9 package, not if there’s a 3.10 package? > > Incidentally, I’m using the ansible code from "openshift-ansible-3.9.40-1”. > > Alan Christie > achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com> > > > >> On 18 Aug 2018, at 13:36, Peter Heitman > <mailto:pe...@heitman.us>> wrote: >> >> See the recent thread "How to avoid upgrading to 3.10". The bottom line is >> to install the 3.9 specific repo. For CentOS that is >> centos-release-openshift-origin39 >> >> On Sat, Aug 18, 2018, 2:44 AM Alan Christie >> mailto:achris...@informaticsmatters.com>> >> wrote: >> HI, >> >> I’ve been deploying new clusters of Origin v3.9 using the official Ansible >> playbook approach for a few weeks now, using what appear to be perfectly >> reasonable base images on OpenStack and AWS. Then, this week, with no other >> changes having been made, the deployment fails with this message: - >> >> One or more checks failed >> check "package_version": >>Some required package(s) are available at a version >>that is higher than requested >> origin-3.10.0 >> origin-node-3.10.0 >> origin-master-3.10.0 >>This will prevent installing the version you requested. >>Please check your enabled repositories or adjust >> openshift_release. >> >> I can avoid the error, and deploy what appears to be a perfectly functional >> 3.9, if I add package_version to openshift_disable_check in the inventory >> the deployment. But this is not the right way to deal with this sort of >> error. >> >> Q1) How does one correctly address this error? >> >> Q2) Out of interest … why is this specific issue an error? I’ve instructed >> the playbook to instal v3.9. I don't care if there is a 3.10 release >> available - I do care if there is not a 3.9. Shouldn’t the error occur if >> there is no 3.9 package, not if there’s a 3.10 package? >> >> Alan Christie >> Informatics Matters Ltd. >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Ansible/Origin 3.9 deployment now fails because "package(s) are available at a version that is higher than requested"
Will do Alan Christie achris...@informaticsmatters.com > On 20 Aug 2018, at 16:08, Daniel Comnea wrote: > > Just came across this email, and still not clear why the issue is still > taking place. > > Can you please move this issue onto > https://github.com/openshift/openshift-ansible > <https://github.com/openshift/openshift-ansible> and provide following info: > > openshift-ansible rpm (if you used that) or the tag used > the gits output with the full trace error you get > > I'll try and see if can help you out.. > > > > On Mon, Aug 20, 2018 at 3:20 PM, Peter Heitman <mailto:pe...@heitman.us>> wrote: > I agree with you. I've hit this same error when previous versions were > released. I'm not sure why defining the version we want to install (and then > using that version of the openshift ansible git) isn't sufficient. As for > installing the repo, I do this before I run the prerequisite playbook, i.e. > ansible all -i -m yum -a "name=centos-release-openshift-origin39 > state=present" --become. That seems to resolve the issue. > > On Mon, Aug 20, 2018 at 10:10 AM Alan Christie > mailto:achris...@informaticsmatters.com>> > wrote: > Thanks Peter. > > Interestingly it looks like it’s Origin’s own “prerequisites.yml” playbook > that’s adding the repo that’s causing problems. My instances don’t have this > repo until I run that playbook. > > Why do I have to remove something that’s being added by the prerequisite > playbook? Especially as my inventory explicitly states > "openshift_release=v3.9”? > > If the answer is “do not run prerequisites.yml” what’s the point of it? > > I still wonder why this specific issue is actually an error? Shouldn’t it be > installing specific version anyway? Shouldn’t it be error occur if there is > no 3.9 package, not if there’s a 3.10 package? > > Incidentally, I’m using the ansible code from "openshift-ansible-3.9.40-1”. > > Alan Christie > achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com> > > > >> On 18 Aug 2018, at 13:36, Peter Heitman > <mailto:pe...@heitman.us>> wrote: >> >> See the recent thread "How to avoid upgrading to 3.10". The bottom line is >> to install the 3.9 specific repo. For CentOS that is >> centos-release-openshift-origin39 >> >> On Sat, Aug 18, 2018, 2:44 AM Alan Christie >> mailto:achris...@informaticsmatters.com>> >> wrote: >> HI, >> >> I’ve been deploying new clusters of Origin v3.9 using the official Ansible >> playbook approach for a few weeks now, using what appear to be perfectly >> reasonable base images on OpenStack and AWS. Then, this week, with no other >> changes having been made, the deployment fails with this message: - >> >> One or more checks failed >> check "package_version": >>Some required package(s) are available at a version >>that is higher than requested >> origin-3.10.0 >> origin-node-3.10.0 >> origin-master-3.10.0 >>This will prevent installing the version you requested. >>Please check your enabled repositories or adjust >> openshift_release. >> >> I can avoid the error, and deploy what appears to be a perfectly functional >> 3.9, if I add package_version to openshift_disable_check in the inventory >> the deployment. But this is not the right way to deal with this sort of >> error. >> >> Q1) How does one correctly address this error? >> >> Q2) Out of interest … why is this specific issue an error? I’ve instructed >> the playbook to instal v3.9. I don't care if there is a 3.10 release >> available - I do care if there is not a 3.9. Shouldn’t the error occur if >> there is no 3.9 package, not if there’s a 3.10 package? >> >> Alan Christie >> Informatics Matters Ltd. >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Ansible/Origin 3.9 deployment now fails because "package(s) are available at a version that is higher than requested"
I’m doing pretty-much the same thing. Prior to “prerequisites” I run the following play: - hosts: nodes become: yes tasks: - name: Install origin39 repo yum: name: centos-release-openshift-origin39 state: present The 3.9 repo appears in /etc/yum.repos.d/ but, after the prerequisites, so does "CentOS-OpenShift-Origin.repo” and the main “deploy_cluster.yml” fails again. The only way through this for me to add “package_version” to “openshift_disable_check”. Alan Christie achris...@informaticsmatters.com > On 20 Aug 2018, at 15:20, Peter Heitman wrote: > > I agree with you. I've hit this same error when previous versions were > released. I'm not sure why defining the version we want to install (and then > using that version of the openshift ansible git) isn't sufficient. As for > installing the repo, I do this before I run the prerequisite playbook, i.e. > ansible all -i -m yum -a "name=centos-release-openshift-origin39 > state=present" --become. That seems to resolve the issue. > > On Mon, Aug 20, 2018 at 10:10 AM Alan Christie > mailto:achris...@informaticsmatters.com>> > wrote: > Thanks Peter. > > Interestingly it looks like it’s Origin’s own “prerequisites.yml” playbook > that’s adding the repo that’s causing problems. My instances don’t have this > repo until I run that playbook. > > Why do I have to remove something that’s being added by the prerequisite > playbook? Especially as my inventory explicitly states > "openshift_release=v3.9”? > > If the answer is “do not run prerequisites.yml” what’s the point of it? > > I still wonder why this specific issue is actually an error? Shouldn’t it be > installing specific version anyway? Shouldn’t it be error occur if there is > no 3.9 package, not if there’s a 3.10 package? > > Incidentally, I’m using the ansible code from "openshift-ansible-3.9.40-1”. > > Alan Christie > achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com> > > > >> On 18 Aug 2018, at 13:36, Peter Heitman > <mailto:pe...@heitman.us>> wrote: >> >> See the recent thread "How to avoid upgrading to 3.10". The bottom line is >> to install the 3.9 specific repo. For CentOS that is >> centos-release-openshift-origin39 >> >> On Sat, Aug 18, 2018, 2:44 AM Alan Christie >> mailto:achris...@informaticsmatters.com>> >> wrote: >> HI, >> >> I’ve been deploying new clusters of Origin v3.9 using the official Ansible >> playbook approach for a few weeks now, using what appear to be perfectly >> reasonable base images on OpenStack and AWS. Then, this week, with no other >> changes having been made, the deployment fails with this message: - >> >> One or more checks failed >> check "package_version": >>Some required package(s) are available at a version >>that is higher than requested >> origin-3.10.0 >> origin-node-3.10.0 >> origin-master-3.10.0 >>This will prevent installing the version you requested. >>Please check your enabled repositories or adjust >> openshift_release. >> >> I can avoid the error, and deploy what appears to be a perfectly functional >> 3.9, if I add package_version to openshift_disable_check in the inventory >> the deployment. But this is not the right way to deal with this sort of >> error. >> >> Q1) How does one correctly address this error? >> >> Q2) Out of interest … why is this specific issue an error? I’ve instructed >> the playbook to instal v3.9. I don't care if there is a 3.10 release >> available - I do care if there is not a 3.9. Shouldn’t the error occur if >> there is no 3.9 package, not if there’s a 3.10 package? >> >> Alan Christie >> Informatics Matters Ltd. >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Ansible/Origin 3.9 deployment now fails because "package(s) are available at a version that is higher than requested"
Thanks Peter. Interestingly it looks like it’s Origin’s own “prerequisites.yml” playbook that’s adding the repo that’s causing problems. My instances don’t have this repo until I run that playbook. Why do I have to remove something that’s being added by the prerequisite playbook? Especially as my inventory explicitly states "openshift_release=v3.9”? If the answer is “do not run prerequisites.yml” what’s the point of it? I still wonder why this specific issue is actually an error? Shouldn’t it be installing specific version anyway? Shouldn’t it be error occur if there is no 3.9 package, not if there’s a 3.10 package? Incidentally, I’m using the ansible code from "openshift-ansible-3.9.40-1”. Alan Christie achris...@informaticsmatters.com > On 18 Aug 2018, at 13:36, Peter Heitman wrote: > > See the recent thread "How to avoid upgrading to 3.10". The bottom line is to > install the 3.9 specific repo. For CentOS that is > centos-release-openshift-origin39 > > On Sat, Aug 18, 2018, 2:44 AM Alan Christie <mailto:achris...@informaticsmatters.com>> wrote: > HI, > > I’ve been deploying new clusters of Origin v3.9 using the official Ansible > playbook approach for a few weeks now, using what appear to be perfectly > reasonable base images on OpenStack and AWS. Then, this week, with no other > changes having been made, the deployment fails with this message: - > > One or more checks failed > check "package_version": >Some required package(s) are available at a version >that is higher than requested > origin-3.10.0 > origin-node-3.10.0 > origin-master-3.10.0 >This will prevent installing the version you requested. >Please check your enabled repositories or adjust > openshift_release. > > I can avoid the error, and deploy what appears to be a perfectly functional > 3.9, if I add package_version to openshift_disable_check in the inventory the > deployment. But this is not the right way to deal with this sort of error. > > Q1) How does one correctly address this error? > > Q2) Out of interest … why is this specific issue an error? I’ve instructed > the playbook to instal v3.9. I don't care if there is a 3.10 release > available - I do care if there is not a 3.9. Shouldn’t the error occur if > there is no 3.9 package, not if there’s a 3.10 package? > > Alan Christie > Informatics Matters Ltd. > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Ansible/Origin 3.9 deployment now fails because "package(s) are available at a version that is higher than requested"
HI, I’ve been deploying new clusters of Origin v3.9 using the official Ansible playbook approach for a few weeks now, using what appear to be perfectly reasonable base images on OpenStack and AWS. Then, this week, with no other changes having been made, the deployment fails with this message: - One or more checks failed check "package_version": Some required package(s) are available at a version that is higher than requested origin-3.10.0 origin-node-3.10.0 origin-master-3.10.0 This will prevent installing the version you requested. Please check your enabled repositories or adjust openshift_release. I can avoid the error, and deploy what appears to be a perfectly functional 3.9, if I add package_version to openshift_disable_check in the inventory the deployment. But this is not the right way to deal with this sort of error. Q1) How does one correctly address this error? Q2) Out of interest … why is this specific issue an error? I’ve instructed the playbook to instal v3.9. I don't care if there is a 3.10 release available - I do care if there is not a 3.9. Shouldn’t the error occur if there is no 3.9 package, not if there’s a 3.10 package? Alan Christie Informatics Matters Ltd. ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Origin 3.9.0's Jenkins - forgetful agents!
Hi Gabe, I’m annotating the ImageStream, essentially doing this: `slave-label: buildah-slave`. The Dockerfile and ImageStream YAML template for my agent (a buildah/skopeo agent) based on jenkins-slave-maven-centos can be found at our public repo (https://github.com/InformaticsMatters/openshift-jenkins-buildah-slave <https://github.com/InformaticsMatters/openshift-jenkins-buildah-slave>). I can understand the agent being replaced when the external image changes but I was curious about why it might change (for no apparent reason). But ... I will take a look at the configMap approach because that sounds a lot more useful - especially for a CI/CD process and would allow me to set the agent up from the command-line without having to use the Jenkins management console. Where might I find a good reference example for the ConfigMap approach? Alan Christie achris...@informaticsmatters.com > On 17 Jul 2018, at 13:18, Gabe Montero wrote: > > Hi Alan, > > Are you leveraging our feature to inject agents by labelling ImageStreams with > the label "role" set to a value of "jenkins-slave", or annotating an > ImageStreamTag > with the same k/v pair? > > If so, that is going to update the agent definition every those items are are > updated > in OpenShift. And there is currently no merging of the partial PodTemplate > config > we construct from ImageStream / ImageStreamTags with whateve modifications to > the PodTemplate was made from within Jenkins after the agent is initially > created > (there are no k8s API we can leverage to do that). > > If the default config we provide for IS/ISTs is not sufficient, I would > suggest switching > to our ConfigMap version of this injection. With that form, you can specify > the > entire PodTemplate definition, including the settings you noted below, where > the image > for the PodTemplate is the docker ref for the IS/IST you are currently > referencing. > > If you are inject agents in another way, please elaborate and we'll go from > there. > > thanks, > gabe > > On Tue, Jul 17, 2018 at 4:45 AM, Alan Christie > mailto:achris...@informaticsmatters.com>> > wrote: > Hi, > > I’m using Jenkins on an OpenShift Origin 3.9.0 deployment and notice that > Jenkins periodically forgets the additional settings for my custom agent. > > I’m using the built-in Jenkins from the catalogue (Jenkins 2.89.4) with all > the plugins updated. > > Incidentally, I doubt it has anything to do with the origin release as > I recall seeing this on earlier (3.7/3.6) releases. > > It happens when I deploy a new agent to Docker hub so this I can partly > understand (i.e. a new ‘latest’ image is available so it’s pulled) - although > I do struggle to understand why it creates a *new* Kubernetes pod template in > the cloud configuration when one already exists for the same agent (but > that’ll probably be the subject of another question). So, each time I push an > image I have to fix the cloud configuration for my agent. > > This I can live with (for now) but it also happens periodically for no > apparent reason. I’m not sure about the frequency but I’ll notice every week, > or every few weeks, the Kubernetes Pod Template for my agent has forgotten > all the _extra_ setup. Things like: - > > - Run in privileged mode > - Additional volumes > - Max number of instances > - Time in minutes to retain slave when idle > > Basically anything adjusted beyond the defaults provided when you first > instantiate an agent is lost. > > Has anyone reported this behaviour before? > Is there a fix or can anyone suggest an area of investigation? > > Alan Christie > achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com> > > > > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Origin 3.9.0's Jenkins - forgetful agents!
Hi, I’m using Jenkins on an OpenShift Origin 3.9.0 deployment and notice that Jenkins periodically forgets the additional settings for my custom agent. I’m using the built-in Jenkins from the catalogue (Jenkins 2.89.4) with all the plugins updated. Incidentally, I doubt it has anything to do with the origin release as I recall seeing this on earlier (3.7/3.6) releases. It happens when I deploy a new agent to Docker hub so this I can partly understand (i.e. a new ‘latest’ image is available so it’s pulled) - although I do struggle to understand why it creates a *new* Kubernetes pod template in the cloud configuration when one already exists for the same agent (but that’ll probably be the subject of another question). So, each time I push an image I have to fix the cloud configuration for my agent. This I can live with (for now) but it also happens periodically for no apparent reason. I’m not sure about the frequency but I’ll notice every week, or every few weeks, the Kubernetes Pod Template for my agent has forgotten all the _extra_ setup. Things like: - - Run in privileged mode - Additional volumes - Max number of instances - Time in minutes to retain slave when idle Basically anything adjusted beyond the defaults provided when you first instantiate an agent is lost. Has anyone reported this behaviour before? Is there a fix or can anyone suggest an area of investigation? Alan Christie achris...@informaticsmatters.com ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
OpenShift Ansible (3.6) "expects to merge two dicts" error
I’ve just run the Ansible configuration playbook for OpenShift 3.6 ("playbooks/byo/config.yml”) using the a recent tag ("openshift-ansible-3.6.173.0.120-1”) with Ansible 2.6.1 and got the following error: - TASK [openshift_hosted_facts : Set hosted facts] Sunday 08 July 2018 09:35:31 +0100 (0:00:00.074) 0:03:48.518 *** fatal: [18.195.236.210]: FAILED! => {"msg": "|failed expects to merge two dicts"} It’s OK with Ansible 2.5.3. Looks like it might be related to https://github.com/openshift/openshift-ansible/issues/4985? <https://github.com/openshift/openshift-ansible/issues/4985?> I appreciate 3.6 is a few version behind now and I have not tried the playbooks for 3.7 or 3.9 but I assume we can expect support for Ansible 2.6 in the OpenShift playbooks? Alan Christie ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Resource limits - an "initial delay" or "max duration" would be really handy
Thanks for the quick response Clayton, very helpful. For now I’ll just set higher limits as you suggest. But just to be clear it’s the “requests” that are used for scheduling Pods on nodes, not it’s “limits” value? Alan > On 5 May 2018, at 16:05, Clayton Coleman <ccole...@redhat.com> wrote: > > Resource limits are fixed because we need to make a good scheduling decision > for the initial burst you’re describing (the extremely high cpu at the > beginning). Some applications might also need similar cpu on restart. Your > workload needs to “burst”, so setting your cpu limit to your startup peak and > your cpu request to a reasonable percentage of the long term use is the best > way to ensure the scheduler can put you on a node that can accommodate you. > No matter what, if you want the cpu at the peak we have to schedule you > somewhere you can get the peak cpu. > > The longer term approach that makes this less annoying is the feedback loop > between actual used resources on a node for running workloads and requested > workloads, and the eviction and descheduling agents (which attempt to > rebalance nodes by achuffling workloads around). > > For the specific case of an app where you know for sure the processes will > use a fraction of the initial limit, you can always voluntarily limit your > own cpu at some time after startup. That could be a side agent that puts a > more restrictive cgroup limit in place on the container after it has been up > a few minutes. > > On May 5, 2018, at 9:57 AM, Alan Christie <achris...@informaticsmatters.com > <mailto:achris...@informaticsmatters.com>> wrote: > >> I like the idea of placing resource limits on applications running in the >> cluster but I wonder if there’s any advice for defining CPU “limits" that >> are more tolerant of application start-up behaviour? Something like the >> initial delay on a readiness or liveness probe for example? It just seems >> like a rather obvious property of any limit. The ones available are just too >> “hard". >> >> One example, and I’m sure this must be common to many applications, is an >> application that consumes a significant proportion of the CPU during >> initialisation but then, in its steady-state, falls back to a much lower and >> non-bursty behaviour. I’ve attached a screenshot of one such application. >> You can see that for a very short period of time, exclusive to >> initialisation, it consumes many more cycles than the post-initialisation >> stage. >> >> >> During initialisation CPU demand tends to fall and memory use tends to rise. >> >> I suspect that what I’m seeing during this time is OpenShift “throttling” my >> app (understandable given the parameters available) and it then fails to >> pass through initialisation fast enough to satisfy the readiness/liveness >> probe and then gets restarted. Again, and again. >> >> I cannot use any sensible steady-state limit (i.e. one that prevents the >> normal stead-state behaviour from deviating) without the application >> constantly being forced to throttle and potentially reboot during >> initialisation. >> >> In this example I’d like to set a a perfectly reasonable CPU limit of >> something like 5Mi (because, after the first minute of execution it should >> never deviate from the steady-state level). Sadly I cannot set a low level >> because OpenShift will not let the application start (for reasons already >> explained) as its initial but very brief CPU load exceeds any “reasonable" >> level I set. >> >> I can get around this by defining an abnormally large cpu limit but, to me, >> using an “abnormal” level sort of defeats the purpose of a limit. >> >> Aren’t resource limits missing one vital parameter, “duration" or "initial >> delay”. >> >> Maybe this is beyond the resources feature and has to be deferred to >> something like prometheus? But can prometheus take actions rather than just >> monitor and alert? And, even if it could, employing prometheus may seem to >> some like "using a sledgehammer to crack a nut”. >> >> Any advice on permitting bursty applications within the cluster but also >> using limits would be most appreciated. >> >> Alan Christie >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Resource limits - an "initial delay" or "max duration" would be really handy
I like the idea of placing resource limits on applications running in the cluster but I wonder if there’s any advice for defining CPU “limits" that are more tolerant of application start-up behaviour? Something like the initial delay on a readiness or liveness probe for example? It just seems like a rather obvious property of any limit. The ones available are just too “hard". One example, and I’m sure this must be common to many applications, is an application that consumes a significant proportion of the CPU during initialisation but then, in its steady-state, falls back to a much lower and non-bursty behaviour. I’ve attached a screenshot of one such application. You can see that for a very short period of time, exclusive to initialisation, it consumes many more cycles than the post-initialisation stage. During initialisation CPU demand tends to fall and memory use tends to rise. I suspect that what I’m seeing during this time is OpenShift “throttling” my app (understandable given the parameters available) and it then fails to pass through initialisation fast enough to satisfy the readiness/liveness probe and then gets restarted. Again, and again. I cannot use any sensible steady-state limit (i.e. one that prevents the normal stead-state behaviour from deviating) without the application constantly being forced to throttle and potentially reboot during initialisation. In this example I’d like to set a a perfectly reasonable CPU limit of something like 5Mi (because, after the first minute of execution it should never deviate from the steady-state level). Sadly I cannot set a low level because OpenShift will not let the application start (for reasons already explained) as its initial but very brief CPU load exceeds any “reasonable" level I set. I can get around this by defining an abnormally large cpu limit but, to me, using an “abnormal” level sort of defeats the purpose of a limit. Aren’t resource limits missing one vital parameter, “duration" or "initial delay”. Maybe this is beyond the resources feature and has to be deferred to something like prometheus? But can prometheus take actions rather than just monitor and alert? And, even if it could, employing prometheus may seem to some like "using a sledgehammer to crack a nut”. Any advice on permitting bursty applications within the cluster but also using limits would be most appreciated. Alan Christie ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Can we use 'Run in privileged mode' in the Jenkins Kubernetes Pod Template?
Thanks Clayton. That’s worked. I’m not sure whether I also need to do an "oc adm policy add-scc-to-user anyuid -z ${SERVICE_ACCOUNT}" (which I have done) but I am now able to build Docker container images in a Jenkins pipeline using a buildah slave-agent! That’s neat. The Dockerfile/image source that builds the Jenkins slave-agent and the (rather fat) resultant agent image are public... https://github.com/alanbchristie/openshift-jenkins-buildah-slave <https://github.com/alanbchristie/openshift-jenkins-buildah-slave> https://hub.docker.com/r/alanbchristie/jenkins-slave-buildah-centos7/ <https://hub.docker.com/r/alanbchristie/jenkins-slave-buildah-centos7/> > On 17 Apr 2018, at 00:39, Clayton Coleman <ccole...@redhat.com> wrote: > > Like any other user, to run privileged an administrator must grant access to > the Jenkins service account to launch privileged pods. That’s done by > granting the service account the slave pod runs as the privileged SCC: > > oc adm policy add-scc-to-user -z SERVICE_ACCT privileged > > On Apr 16, 2018, at 2:46 PM, Alan Christie <achris...@informaticsmatters.com > <mailto:achris...@informaticsmatters.com>> wrote: > >> I’m trying to get around building Docker containers in a Jenkins slave-agent >> (because the Docker socket is not available). Along comes `buildah` claiming >> to be a lightweight OCI builder so I’ve built a `buildah` Jenkins slave >> agent based on the `openshift/jenkins-slave-maven-centos7` image >> (https://github.com/alanbchristie/openshift-jenkins-buildah-slave.git >> <https://github.com/alanbchristie/openshift-jenkins-buildah-slave.git>). >> >> Nice. >> >> Sadly… >> >> …the agent appears useless because buildah needs to be run as root!!! >> >> So I walk from one problem into another. >> >> The wonderfully named option in Jenkins -> Manage Jenkins -> Configure >> System -> Kubernetes Pod Template -> "Run in privileged mode" was so >> appealing I just had to click it! >> >> But … sigh ... I still can’t run as root, instead I get the **Privileged >> containers are not allowed provider restricted** error. >> >> This has probably been asked before but... >> Is there anything that can be done to run slave-agents as root? (I don't >> want a BuildConfig, I want to run my existing complex pipelines which also >> build docker images in a Jenkins agent) >> If not, is someone thinking about supporting this? >> Alan Christie >> >> >> ___ >> users mailing list >> users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Can we use 'Run in privileged mode' in the Jenkins Kubernetes Pod Template?
I’m trying to get around building Docker containers in a Jenkins slave-agent (because the Docker socket is not available). Along comes `buildah` claiming to be a lightweight OCI builder so I’ve built a `buildah` Jenkins slave agent based on the `openshift/jenkins-slave-maven-centos7` image (https://github.com/alanbchristie/openshift-jenkins-buildah-slave.git). Nice. Sadly… …the agent appears useless because buildah needs to be run as root!!! So I walk from one problem into another. The wonderfully named option in Jenkins -> Manage Jenkins -> Configure System -> Kubernetes Pod Template -> "Run in privileged mode" was so appealing I just had to click it! But … sigh ... I still can’t run as root, instead I get the **Privileged containers are not allowed provider restricted** error. This has probably been asked before but... Is there anything that can be done to run slave-agents as root? (I don't want a BuildConfig, I want to run my existing complex pipelines which also build docker images in a Jenkins agent) If not, is someone thinking about supporting this? Alan Christie ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Empty /etc/cni/net.d with Ansible installer on 3.7 and 3.9
Thanks Clayton. The base system’s been tested with two independently authored base images but I’ll try and ensure I have time to follow your suggestion next week and report back if anything repetitive’s been found. Knowing that this is not a common problem narrows it down. Thanks. Alan. > On 14 Apr 2018, at 16:33, Clayton Coleman <ccole...@redhat.com> wrote: > > I don’t think we’ve seen it elsewhere (certainly not repeatedly), which > probably indicates something specific to your environment, inventory, or base > system. > > I suggested restarting because this is all the same debugging info we’d ask > in a bug - knowing whether it’s transient and clears on a restart narrows the > issue down (likely to be a bug in the core code). > > On Apr 14, 2018, at 4:30 AM, Alan Christie <achris...@informaticsmatters.com > <mailto:achris...@informaticsmatters.com>> wrote: > >> Thanks Clayton, >> >> I’ll take a look a closer look next week because the solution seems to be >> fixing the symptoms, not the cause and I’d like to get to a stage where we >> don’t need to patch the installation and restart-it. >> >> This happens pretty much *every time* that I install 3.7 or 3.9 on AWS and a >> significant number of times on OpenStack. >> >> Has this been reported by others because it’s so common that we can't be the >> only ones seeing this? >> >> Alan >> >> >>> On 13 Apr 2018, at 21:35, Clayton Coleman <ccole...@redhat.com >>> <mailto:ccole...@redhat.com>> wrote: >>> >>> Can not find allocated subnet usually means the master didn’t hand out a >>> chunk of SDN IPs to that node. Check the master’s origin-master-controller >>> logs and look for anything that relates to the node name mentioned in your >>> error. If you see a problem, try restarting the origin-master-controllers >>> processes on all nodes. >>> >>> On Apr 13, 2018, at 2:26 PM, Alan Christie >>> <achris...@informaticsmatters.com >>> <mailto:achris...@informaticsmatters.com>> wrote: >>> >>>> What’s wrong with the post-3.6 OpenShift/Origin release? >>>> >>>> I build my cluster with Terraform and OpenShift 3.6 (on AWS) is >>>> wonderfully stable and I have no problem creating clusters. But, with both >>>> 3.7 and 3.9, I just cannot start a cluster without encountering at least >>>> one node with an empty /etc/cni/net.d. >>>> >>>> This applies to 3.7 and 3.9 on AWS and two OpenStack providers. In all >>>> cases the Ansible installer enters the "RUNNING HANDLER [openshift_node : >>>> restart node]" task but this, for the vast majority of installations on >>>> OpenStack and every single attempt in AWS, always fails. I’m worried that >>>> I’ve got something clearly very wrong and have had to return to 3.6 to get >>>> anything done. >>>> >>>> RUNNING HANDLER [openshift_node : restart openvswitch] >>>> >>>> Friday 13 April 2018 13:19:09 +0100 (0:00:00.062) 0:09:28.744 >>>> ** >>>> changed: [18.195.236.210] >>>> changed: [18.195.126.190] >>>> changed: [18.184.65.88] >>>> >>>> RUNNING HANDLER [openshift_node : restart openvswitch pause] >>>> ** >>>> Friday 13 April 2018 13:19:09 +0100 (0:00:00.720) 0:09:29.464 >>>> ** >>>> skipping: [18.195.236.210] >>>> >>>> RUNNING HANDLER [openshift_node : restart node] >>>> *** >>>> Friday 13 April 2018 13:19:09 +0100 (0:00:00.036) 0:09:29.501 >>>> ** >>>> FAILED - RETRYING: restart node (3 retries left). >>>> FAILED - RETRYING: restart node (3 retries left). >>>> FAILED - RETRYING: restart node (3 retries left). >>>> FAILED - RETRYING: restart node (2 retries left). >>>> FAILED - RETRYING: restart node (2 retries left). >>>> FAILED - RETRYING: restart node (2 retries left). >>>> FAILED - RETRYING: restart node (1 retries left). >>>> FAILED - RETRYING: restart node (1 retries left). >>>> FAILED - RETRYING: restart node (1 retries left). >>>> fatal: [18.195.236.210]: FAILED! => {"attempts": 3, "
Empty /etc/cni/net.d with Ansible installer on 3.7 and 3.9
What’s wrong with the post-3.6 OpenShift/Origin release? I build my cluster with Terraform and OpenShift 3.6 (on AWS) is wonderfully stable and I have no problem creating clusters. But, with both 3.7 and 3.9, I just cannot start a cluster without encountering at least one node with an empty /etc/cni/net.d. This applies to 3.7 and 3.9 on AWS and two OpenStack providers. In all cases the Ansible installer enters the "RUNNING HANDLER [openshift_node : restart node]" task but this, for the vast majority of installations on OpenStack and every single attempt in AWS, always fails. I’m worried that I’ve got something clearly very wrong and have had to return to 3.6 to get anything done. RUNNING HANDLER [openshift_node : restart openvswitch] Friday 13 April 2018 13:19:09 +0100 (0:00:00.062) 0:09:28.744 ** changed: [18.195.236.210] changed: [18.195.126.190] changed: [18.184.65.88] RUNNING HANDLER [openshift_node : restart openvswitch pause] ** Friday 13 April 2018 13:19:09 +0100 (0:00:00.720) 0:09:29.464 ** skipping: [18.195.236.210] RUNNING HANDLER [openshift_node : restart node] *** Friday 13 April 2018 13:19:09 +0100 (0:00:00.036) 0:09:29.501 ** FAILED - RETRYING: restart node (3 retries left). FAILED - RETRYING: restart node (3 retries left). FAILED - RETRYING: restart node (3 retries left). FAILED - RETRYING: restart node (2 retries left). FAILED - RETRYING: restart node (2 retries left). FAILED - RETRYING: restart node (2 retries left). FAILED - RETRYING: restart node (1 retries left). FAILED - RETRYING: restart node (1 retries left). FAILED - RETRYING: restart node (1 retries left). fatal: [18.195.236.210]: FAILED! => {"attempts": 3, "changed": false, "msg": "Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See \"systemctl status origin-node.service\" and \"journalctl -xe\" for details.\n"} fatal: [18.195.126.190]: FAILED! => {"attempts": 3, "changed": false, "msg": "Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See \"systemctl status origin-node.service\" and \"journalctl -xe\" for details.\n"} fatal: [18.184.65.88]: FAILED! => {"attempts": 3, "changed": false, "msg": "Unable to restart service origin-node: Job for origin-node.service failed because the control process exited with error code. See \"systemctl status origin-node.service\" and \"journalctl -xe\" for details.\n"} When I jump onto a suspect node after the failure I find/etc/cni/net.d is empty and the journal contains the message "No networks found in /etc/cni/net.d”... -- The start-up result is done. Apr 13 12:23:44 ip-10-0-0-61.eu-central-1.compute.internal origin-master-controllers[26728]: I0413 12:23:44.850154 26728 leaderelection.go:179] attempting to acquire leader lease... Apr 13 12:23:44 ip-10-0-0-61.eu-central-1.compute.internal origin-node[26683]: W0413 12:23:44.933963 26683 cni.go:189] Unable to update cni config: No networks found in /etc/cni/net.d Apr 13 12:23:44 ip-10-0-0-61.eu-central-1.compute.internal origin-node[26683]: E0413 12:23:44.934447 26683 kubelet.go:2112] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Apr 13 12:23:47 ip-10-0-0-61.eu-central-1.compute.internal origin-node[26683]: W0413 12:23:47.947200 26683 sdn_controller.go:48] Could not find an allocated subnet for node: ip-10-0-0-61.eu-central-1.compute.internal, Waiting... Is anyone else seeing this and, more importantly, is there a clear cause and solution? I cannot start 3.7 and have been tinkering with it for days on AWS at all and on OpenStack 3 out of 4 attempts fail. I just tried 3.9 to find the same failure on AWS and have just given up and returned to the wonderfully stable 3.6. Alan Christie ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Help using ImageStreams, DCs and ImagePullSecrets templates with a GitLab private registry (v3.6)
Ah ha! OK, the first approach did not work but your second suggestion worked!! Phew, thanks … although I had to remove the “-w0" argument (they’re not recognised on OSX). So the following allowed me to pull from gitlab: - oc create -f - < On 9 Apr 2018, at 21:29, Pavel Gashev <p...@acronis.com> wrote: > > Alan, > > Just try the following: > > # docker login registry.gitlab.com <http://registry.gitlab.com/> > # oc create secret generic --from-file=.dockerconfigjson=.docker/config.json > --type=kubernetes.io/dockerconfigjson <http://kubernetes.io/dockerconfigjson> > pullsecret > > another way: > > # GITLAB_USER=user > # GITLAB_PASSWORD=password > # oc create -f - < apiVersion: v1 > kind: Secret > metadata: > name: pullsecret > type: kubernetes.io/dockerconfigjson <http://kubernetes.io/dockerconfigjson> > data: > .dockerconfigjson: $(echo -n "{\"auths\":{\"registry.gitlab.com > <http://registry.gitlab.com/>\":{\"auth\":\"`echo -n > "$GITLAB_USER:$GITLAB_PASSWORD" | base64 -w0`\"}}}"|base64 -w0) > EOF > > From: <users-boun...@lists.openshift.redhat.com > <mailto:users-boun...@lists.openshift.redhat.com>> on behalf of Alan Christie > <achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com>> > Date: Monday, 9 April 2018 at 01:57 > To: Gaurav P <gaurav.li...@gmail.com <mailto:gaurav.li...@gmail.com>> > Cc: users <users@lists.openshift.redhat.com > <mailto:users@lists.openshift.redhat.com>> > Subject: Re: Help using ImageStreams, DCs and ImagePullSecrets templates with > a GitLab private registry (v3.6) > > Sorry Guys, but I’m getting nowhere here. > > A long time has passed and I have been doing other things but keep returning > to this and trying every single combination of URL that I can but nothing is > working for me with GitLab. > > The good news is that I have simplified the problem… > > My simple setup, which is perfectly able to pull images from GitLab in v3.6, > uses just one secret and does not need the “oc secrets link […]” command. > This simple setup does not work with OpenShift v3.7. Instead I get image pull > errors (shown in attached screenshot). Is there anyone who’s pulled an image > from GitLab? And can someone explain why my single secret setup works in 3.6 > but does not in 3.7? > > Alan. > > > > > On 19 Jan 2018, at 15:56, Gaurav P <gaurav.li...@gmail.com > <mailto:gaurav.li...@gmail.com>> wrote: > > Louis, > > In our case, it is Artifactory. Relevant headers: > > HTTP/1.1 401 Unauthorized > Server: Artifactory/5.4.5 > X-Artifactory-Id: > X-Artifactory-Node-Id: > WWW-Authenticate: Basic realm="Artifactory Realm" > > Note however that in the case of Artifactory, Docker registries have to be > fronted by haproxy, so the Basic auth might be coming from there... > > - Gaurav > > On Fri, Jan 19, 2018 at 3:03 AM, Louis Santillan <lsant...@redhat.com > <mailto:lsant...@redhat.com>> wrote: > Gaurav, Alan, > > What is the full (redact if necessary for artifactory) output of `curl -kv > https:///v2//`? > > I get the following headers when I naively hit > `https://registry.gitlab.com/v2/myproject/myimage/manifests/latest` > <https://registry.gitlab.com/v2/myproject/myimage/manifests/latest> > > 1. > Content-Length: > > > 160 > > > 2. > Content-Type: > > > application/json; charset=utf-8 > > > 3. > Date: > > > Fri, 19 Jan 2018 07:58:26 GMT > > > 4. > Docker-Distribution-Api-Version: > > > registry/2.0 > > > 5. > Www-Authenticate: > > > Bearer realm="https://gitlab.com/jwt/auth > <https://gitlab.com/jwt/auth>",service="container_registry",scope="repository:myproject/myimage:pull" > > > 6. > X-Content-Type-Options: > > > nosniff > > Looks like `https://gitlab.com/jwt/auth` <https://gitlab.com/jwt/auth> is the > auth URL Maciej is speaking of. > > The docs also mention having to `link` the secret to the namespace's > `:default` service account for pod image pulling [0]. There's a step or two > extra there that Maciej had not yet mentioned. > > [0] > https://docs.openshift.com/container-platform/3.7/dev_guide/managing_images.html#allowing-pods-to-reference-images-from-other-secured-registries > > <https://docs.openshift.com/container-platform/3.7/dev_guide/managing_images.html#allowing-pods-to-reference-images-fr
Re: Help using ImageStreams, DCs and ImagePullSecrets templates with a GitLab private registry (v3.6)
Thanks for your guidance so far Maciej but none of this is working for me. [1] doesn’t really help as I’m past that and, sadly the 1,500 lines and numerous of posts in issue 9584 [2] are exhausting to trawl though and still leave me with an inability to pull from GitLab using an image stream. Again, I have a working DC/IPS solution. I understand secrets, DCs and IPS but I still cannot get ImageStreams to work. I just get… Internal error occurred: Get https://registry.gitlab.com/v2/myproject/myimage.manifests/latest: denied: access forbidden. I’m just about exhausted. So, if my setup is: OpenShift 3.6.1 An image that's: myproject/myimage:latest A registry that’s: registry.gitlab.com A pull secret that works for DC/IPS - i.e. I can pull the image from the private repo with my DC and the installed secret. What... would my ImageStream yaml template or json look like? would I need to change in my working DC yaml? if any, are the crucial roles my OC user needs? > On 3 Jan 2018, at 11:03, Maciej Szulik <maszu...@redhat.com> wrote: > > Have a look at [1] which should explain how to connect the IS with the > secret. Additionally, > there's [2] which explains problems when auth is delegated to a different uri. > > Maciej > > > [1] > https://docs.openshift.org/latest/dev_guide/managing_images.html#private-registries > > <https://docs.openshift.org/latest/dev_guide/managing_images.html#private-registries> > [2] https://github.com/openshift/origin/issues/9584 > <https://github.com/openshift/origin/issues/9584> > > On Wed, Jan 3, 2018 at 10:34 AM, Alan Christie > <achris...@informaticsmatters.com <mailto:achris...@informaticsmatters.com>> > wrote: > Hi all, > > I’m successfully using a DeploymentConfig (DC) and an ImagePullSecret (IPS) > templates with OpenShift Origin v3.6 to spin-up my application from a > container image hosted on a private GitLab registry. But I want the > deployment to re-deploy when the GitLab image changes and to do this I > believe I need to employ an ImageStream. > > I’m, comfortable with each of these objects and have successfully used > ImageStreams and DCs with public DockerHub images (that was easy because > there are so many examples). But I’m stuck trying to pull an image using an > ImageStream from a private GitLab-hosted docker registry. > > The IPS seems to belong to the DC, so how do I get my ImageStream to use it? > My initial attempts have not been successful. All I get, after a number of > attempts at this, is the following error on the ImageScreen console... > > Internal error occurred: Get > https://registry.gitlab.com/v2/myproject/myimage/manifests/latest > <https://registry.gitlab.com/v2/myproject/myimage/manifests/latest>: denied: > access forbidden. Timestamp: 2017-12-28T14:27:12Z Error count: 2. > > Where “myproject” and “myimage” are my GitLab project and image names. > > My working DC/IPS combo looks something like this… > > […] > imagePullSecrets: > - name: gitlab-myproject > containers: > - image: registry.gitlab.com/myproject/myimage:stable > <http://registry.gitlab.com/myproject/myimage:stable> > name: myimage > […] > > But what would my DC/IPS/ImageStream objects look like? > > Thanks in advance. > > Alan Christie. > > > ___ > users mailing list > users@lists.openshift.redhat.com <mailto:users@lists.openshift.redhat.com> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > <http://lists.openshift.redhat.com/openshiftmm/listinfo/users> > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Help using ImageStreams, DCs and ImagePullSecrets templates with a GitLab private registry (v3.6)
Hi all, I’m successfully using a DeploymentConfig (DC) and an ImagePullSecret (IPS) templates with OpenShift Origin v3.6 to spin-up my application from a container image hosted on a private GitLab registry. But I want the deployment to re-deploy when the GitLab image changes and to do this I believe I need to employ an ImageStream. I’m, comfortable with each of these objects and have successfully used ImageStreams and DCs with public DockerHub images (that was easy because there are so many examples). But I’m stuck trying to pull an image using an ImageStream from a private GitLab-hosted docker registry. The IPS seems to belong to the DC, so how do I get my ImageStream to use it? My initial attempts have not been successful. All I get, after a number of attempts at this, is the following error on the ImageScreen console... Internal error occurred: Get https://registry.gitlab.com/v2/myproject/myimage/manifests/latest: denied: access forbidden. Timestamp: 2017-12-28T14:27:12Z Error count: 2. Where “myproject” and “myimage” are my GitLab project and image names. My working DC/IPS combo looks something like this… […] imagePullSecrets: - name: gitlab-myproject containers: - image: registry.gitlab.com/myproject/myimage:stable name: myimage […] But what would my DC/IPS/ImageStream objects look like? Thanks in advance. Alan Christie. ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users