> The TemplateInstance object should have an ownerReference to a
BrokerTemplateInstance and that reference not being handled properly is the
bug.  If you remove that ownerRef from the TemplateInstance, you should be
safe from undesired of the TemplateInstance (and the cascading delete of
everything else) (at least w/ respect to the bug we are aware of).

Nice, that did the trick.

I did an oc patch, and that fixed it:

$ oc get templateinstance
NAME                                   TEMPLATE
b180d814-2917-4c7e-875f-b91e5d4743e8   jenkins-ephemeral

$ oc patch templateinstance b180d814-2917-4c7e-875f-b91e5d4743e8 --type
json -p='[{"op": "remove", "path": "/metadata/ownerReferences"}]'
templateinstance "b180d814-2917-4c7e-875f-b91e5d4743e8" patched


Also, I've got another stale serviceinstance after a few rounds of testing,
I cannot for the life of me make it die, meaning I can't delete the project
that it is a part of, I've tried a force delete, but it doesn't work.

$ oc delete serviceinstance jenkins-ephemeral-8dmk9 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running
resource has been terminated. The resource may continue to run on the
cluster indefinitely.
serviceinstance "jenkins-ephemeral-8dmk9" deleted

$ oc get serviceinstance
NAME                      AGE
jenkins-ephemeral-8dmk9   7m

What's the magic sauce to make it so that I can delete the serviceinstance?

On 8 January 2018 at 15:29, Ben Parees <bpar...@redhat.com> wrote:

>
>
> On Sun, Jan 7, 2018 at 9:35 PM, Joel Pearson <
> japear...@agiledigital.com.au> wrote:
>
>> Ahh, I looked into all the objects that were getting deleted and they all
>> have an ownerReference, eg:
>>
>> "ownerReferences": [
>>                     {
>>                         "apiVersion": "template.openshift.io/v1",
>>                         "kind": "TemplateInstance",
>>                         "name": "75c0ccd3-642e-4035-a5cf-3c27e54cae40",
>>                         "uid": "a7301596-f41a-11e7-88e5-fa163eb8ca3a",
>>                         "blockOwnerDeletion": true
>>                     }
>>                 ]
>>
>> That looks like what patch is about. I also found that if I tried to edit
>> an object and remove the ownerReference then it also triggered a garbage
>> collect on the spot and all the resources evaporated.
>>
>>
> Sounds worse than the behavior we were aware of, but fundamentally what's
> causing the cascade deletion is this:
>
> Jan 08 00:26:49 master-0.openshift.staging.local dockerd-current[23329]:
> I0108 00:26:49.904249       1 garbagecollector.go:394] delete object [
> template.openshift.io/v1/TemplateInstance, namespace: jenkins-test, name:
> e3639aec-bbbc-4170-b0e4-3b63735af348, uid: 
> 915d585d-f408-11e7-88e5-fa163eb8ca3a]
> with propagation policy Background
>
> The TemplateInstance object should have an ownerReference to a
> BrokerTemplateInstance and that reference not being handled properly is the
> bug.  If you remove that ownerRef from the TemplateInstance, you should be
> safe from undesired of the TemplateInstance (and the cascading delete of
> everything else) (at least w/ respect to the bug we are aware of).
>
> That should be the only ownerRef you need to delete unless there are other
> (to date unknow) bugs in the GC behavior, or in how the TSB is creating the
> ownerRef chain.
>
>
>
>> So I guess my workaround can be, run the template, wait for everything to
>> deploy, export all templated resources to json, strip out ownerReferences,
>> and create all the resources again.
>>
>> On Mon, Jan 8, 2018 at 12:30 PM Joel Pearson <
>> japear...@agiledigital.com.au> wrote:
>>
>>> Hmm, in my case I don't need to need to restart to cause the problem to
>>> happen. Is there some way to run nightlies of openshift:release-3.7 using
>>> the openshift-ansible? So that I can verify it's fixed for me?
>>>
>>> On Mon, Jan 8, 2018 at 12:23 PM Jordan Liggitt <jligg...@redhat.com>
>>> wrote:
>>>
>>>> Garbage collection in particular could be related to
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1525699 (fixed in
>>>> https://github.com/openshift/origin/pull/17818 but not included in a
>>>> point release yet)
>>>>
>>>>
>>>> On Jan 7, 2018, at 8:17 PM, Joel Pearson <japear...@agiledigital.com.au>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Has anyone else noticed that the new OpenShift Origin 3.7 Template
>>>> Broker seems super flaky?
>>>>
>>>> For example, if I deploy a Jenkins (Persistent or Ephemeral), and then
>>>> I modify the route, by adding an annotation for example:
>>>>
>>>> kubernetes.io/tls-acme: 'true'
>>>>
>>>> I have https://github.com/tnozicka/openshift-acme Installed in the
>>>> cluster which then grabs an SSL cert for me, adds it to the route, then
>>>> moments later all resources from the template are garbage collected for no
>>>> apparent reason.
>>>>
>>>> I also got the same behaviour when I modified the service account the
>>>> Jenkins template uses, I added an additional route so I added a new "
>>>> serviceaccounts.openshift.io/oauth-redirectreference.jenkins:" entry.
>>>> It took a bit longer (like 12 hours), but it all disappeared again.  I have
>>>> a suspicion that if you modify any object that a template created, then
>>>> eventually the template broker will remove all objects it created.
>>>>
>>>> Is there any way to disable the new template broker and use the old
>>>> template system?
>>>>
>>>> In Origin 3.6 it was flawless and worked with openshift-acme without
>>>> any problems at all.
>>>>
>>>> I should mention that if I create things manually then it works fine, I
>>>> can use openshift-acme, and all my resources don't vanish at whim.
>>>>
>>>> Here is a snippet of the logs, you can see the acme points are removed
>>>> after successfully getting a cert, and then moments later, the deleting
>>>> starts:
>>>>
>>>> Jan 08 00:26:47 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:47.648255       1
>>>> leaderelection.go:199] successfully renewed lease
>>>> kube-service-catalog/service-catalog-controller-manager
>>>> Jan 08 00:26:47 master-0.openshift.staging.local origin-node[26684]:
>>>> I0108 00:26:47.744777   26749 roundrobin.go:338] LoadBalancerRR: Removing
>>>> endpoints for jenkins-test/acme-9cv97q5dn8:
>>>> Jan 08 00:26:47 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:47.744777   26749 roundrobin.go:338]
>>>> LoadBalancerRR: Removing endpoints for jenkins-test/acme-9cv97q5dn8:
>>>> Jan 08 00:26:47 master-0.openshift.staging.local origin-node[26684]:
>>>> I0108 00:26:47.762005   26749 ovs.go:143] Error executing ovs-ofctl:
>>>> ovs-ofctl: None: invalid IP address
>>>> Jan 08 00:26:47 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:47.762005   26749 ovs.go:143] Error
>>>> executing ovs-ofctl: ovs-ofctl: None: invalid IP address
>>>> Jan 08 00:26:47 master-0.openshift.staging.local
>>>> dockerd-current[23329]: E0108 00:26:47.765091   26749
>>>> sdn_controller.go:284] Error deleting OVS flows for service &{{ }
>>>> {acme-9cv97q5dn8  jenkins-test 
>>>> /api/v1/namespaces/jenkins-test/services/acme-9cv97q5dn8
>>>> 94c6b3b3-f40a-11e7-88e5-fa163eb8ca3a 622382 0 2018-01-08 00:26:34
>>>> +0000 UTC <nil> <nil> map[] map[] [] nil [] } {ClusterIP [{http TCP 80 {0
>>>> 80 } 0}] map[] None  []  None []  0} {{[]}}}: exit status 1
>>>> Jan 08 00:26:47 master-0.openshift.staging.local origin-node[26684]:
>>>> E0108 00:26:47.765091   26749 sdn_controller.go:284] Error deleting OVS
>>>> flows for service &{{ } {acme-9cv97q5dn8  jenkins-test
>>>> /api/v1/namespaces/jenkins-test/services/acme-9cv97q5dn8
>>>> 94c6b3b3-f40a-11e7-88e5-fa163eb8ca3a 622382 0 2018-01-08 00:26:34
>>>> +0000 UTC <nil> <nil> map[] map[] [] nil [] } {ClusterIP [{http TCP 80 {0
>>>> 80 } 0}] map[] None  []  None []  0} {{[]}}}: exit status 1
>>>> Jan 08 00:26:48 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:48.139090 <13%2090%2090>       1
>>>> rest.go:362] Starting watch for /api/v1/namespaces, rv=622418 labels=
>>>> fields= timeout=8m38s
>>>> Jan 08 00:26:48 master-0.openshift.staging.local
>>>> origin-master-api[23448]: I0108 00:26:48.139090 <13%2090%2090>       1
>>>> rest.go:362] Starting watch for /api/v1/namespaces, rv=622418 labels=
>>>> fields= timeout=8m38s
>>>> Jan 08 00:26:49 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:49.668205       1
>>>> leaderelection.go:199] successfully renewed lease
>>>> kube-service-catalog/service-catalog-controller-manager
>>>> Jan 08 00:26:49 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:49.885207       1
>>>> garbagecollector.go:291] processing item [template.openshift.io/v1/Temp
>>>> lateInstance, namespace: jenkins-test, name:
>>>> e3639aec-bbbc-4170-b0e4-3b63735af348, uid:
>>>> 915d585d-f408-11e7-88e5-fa163eb8ca3a]
>>>> Jan 08 00:26:49 master-0.openshift.staging.local
>>>> origin-master-controllers[73353]: I0108 00:26:49.885207       1
>>>> garbagecollector.go:291] processing item [template.openshift.io/v1/Temp
>>>> lateInstance, namespace: jenkins-test, name:
>>>> e3639aec-bbbc-4170-b0e4-3b63735af348, uid:
>>>> 915d585d-f408-11e7-88e5-fa163eb8ca3a]
>>>> Jan 08 00:26:49 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:49.904249       1
>>>> garbagecollector.go:394] delete object [template.openshift.io/v1/Temp
>>>> lateInstance, namespace: jenkins-test, name:
>>>> e3639aec-bbbc-4170-b0e4-3b63735af348, uid:
>>>> 915d585d-f408-11e7-88e5-fa163eb8ca3a] with propagation policy
>>>> Background
>>>> Jan 08 00:26:49 master-0.openshift.staging.local
>>>> origin-master-controllers[73353]: I0108 00:26:49.904249       1
>>>> garbagecollector.go:394] delete object [template.openshift.io/v1/Temp
>>>> lateInstance, namespace: jenkins-test, name:
>>>> e3639aec-bbbc-4170-b0e4-3b63735af348, uid:
>>>> 915d585d-f408-11e7-88e5-fa163eb8ca3a] with propagation policy
>>>> Background
>>>> Jan 08 00:26:49 master-0.openshift.staging.local
>>>> dockerd-current[23329]: I0108 00:26:49.910964       1
>>>> garbagecollector.go:291] processing item [apps.openshift.io/v1/Deployme
>>>> ntConfig, namespace: jenkins-test, name: jenkins, uid:
>>>> 91759f72-f408-11e7-88e5-fa163eb8ca3a]
>>>>
>>>> Any ideas? Has anyone else seen this?  Considering
>>>> "openshift-ansible-service-broker" is deployed in a broken state by
>>>> openshift-ansible on the release-3.7 branch (for origin, I think enterprise
>>>> would work as the tags exist), it makes me think that not many people are
>>>> using the new service brokers that are talked about here:
>>>> https://blog.openshift.com/whats-new-in-openshift-3-7-
>>>> service-catalog-and-brokers/
>>>>
>>>> Thanks,
>>>>
>>>> Joel
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.openshift.redhat.com
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>
>>>>
>> _______________________________________________
>> users mailing list
>> users@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>>
>
>
> --
> Ben Parees | OpenShift
>
>


-- 
Kind Regards,

Joel Pearson
Agile Digital | Senior Software Consultant

Love Your Softwareâ„¢ | ABN 98 106 361 273
p: 1300 858 277 | m: 0405 417 843 <0405417843> | w: agiledigital.com.au
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to