Re: [onap-discuss] [EXTERNAL] Service distribution error on latest ONAP/OOM

Ramanarayanan, Karthick Fri, 26 Jan 2018 10:23:28 -0800

The sdc-backend was always a suspect in my setup (56 core) and I invariably 
used to restart the backend pods to get the backend health checks to succeed:


curl http://127.0.0.1:30205/sdc2/rest/healthCheck


This returns "Service unavailable" when backend doesn't come up. If you restart 
the cassandra/es/kibana pods and then restart the backend, it would come up.


In my single node k8s host (k8s directly on the host as in my earlier runs),

I see health check component DE for backend failing. (distribution engine)


Everything else is up.


curl http://127.0.0.1:30205/sdc2/rest/healthCheck


 {
      "healthCheckComponent": "DE",
      "healthCheckStatus": "DOWN",
      "description": "U-EB cluster is not available"
    },


This probably implies that in my setup, the UebServers list fetched in the 
backend catalog code before running the DistributionHealthCheck servers fetched 
from the distributionEngine configuration is not proper. (when running without 
dcae vm or dcae disabled)


This is probably the reason why distribution for service fails with policy 
exception.

Its not able to find the ueb server list perhaps when dcae is disabled.

Alexis would know best.


Regards,

-Karthick

________________________________
From: FREEMAN, BRIAN D <[email protected]>
Sent: Friday, January 26, 2018 9:52:24 AM
To: Alexis de Talhouët; Ramanarayanan, Karthick
Cc: [email protected]
Subject: RE: [onap-discuss] [**EXTERNAL**] Service distribution error on latest 
ONAP/OOM

Alexi,



I cant get OOM install to work today (it was working yesterday) - seems to fail 
on sdc - doesnt pass healthcheck due to sdc-be as best I can tell .



I use cd.sh should i use the 4 step process below instead ?



Brian





-----Original Message-----

From: [email protected] 
[mailto:[email protected]] On Behalf Of Alexis de Talhouët

Sent: Friday, January 26, 2018 8:50 AM

To: Ramanarayanan, Karthick <[email protected]>

Cc: [email protected]

Subject: Re: [onap-discuss] [**EXTERNAL**] Service distribution error on latest 
ONAP/OOM



Karthick,



I’ve just re-tested on latest Amsterdam, and distribute does work fine.



I don’t know if you have redeploy the whole ONAP or not, but understand that 
the issue you had with distribution not working was an issue

impacting so aai and sdc.

The reason is, sdc is configured with the ueb cluster ip address (dmaap, the 
message bus basically), and the way ueb is configured in sdc is using

external access to dmaap, using the k8s node ip instead of the internal 
networking of k8s (e.g. dmaap.onap-message-router).

This change was done recently to accommodate DCAEGEN2 service-change-handler 
micro-service that has to connect to dmaap.

sdc has an api so one can retrieve the ueb cluster ips, 
/sdc/v1/distributionUebCluster, and all the consumer of sdc distribute are 
using the sdc-distribution-client application,

provided by sdc, that retrieves the ueb cluster ips using the api mentioned 
before. Hence when the DCAE micro service was retrieving the ips of the ueb 
cluster, and that one

was configured using k8s networking (dmaap.onap-message-router), the micro 
service was unable to resolve this; that’s why I changed it to the k8s node ip, 
that has to be resolvable

by the DCAE’s VMs.



Hope that clarifies a little bit what happen, and explain why I recommand you 
to re-deploy the whole onap by doing the following:



- In oom/kubernetes/oneclick: ./deleteAll.sh -n onap

- In the k8s nodes, rm -rf /dockerdata-nfs

- In oom/kubernetes/config: ./createConfig.sh -n onap

- In oom/kubernetes/oneclick: ./createAll.sh -n onap



This should take no longer than 15mn as you already have the docker images in 
your k8s hosts.



Alexis



> On Jan 25, 2018, at 8:17 PM, Ramanarayanan, Karthick <[email protected]> 
> wrote:

>

> Hi Alexis,

>  I am still getting the Policy Exception error POL5000 with dcae disabled, 
> (dcaegen2 app not running as mentioned earlier).

>  I am on the latest OOM for amsterdam (policy images are 1.1.3 as verified).

>  Service distribution immediately fails.

>  policy pod logs don't indicate anything.

> They do resolve dmaap.onap-message-router fine and connected to dmaap on port 
> 3904.

>

> Regards,

> -Karthick

> From: Ramanarayanan, Karthick

> Sent: Thursday, January 25, 2018 10:33:44 AM

> To: Alexis de Talhouët

> Cc: [email protected]; Bainbridge, David

> Subject: Re: [**EXTERNAL**] [onap-discuss] Service distribution error on 
> latest ONAP/OOM

>

> Thanks Alexis.

> Fix is looking good but I haven't moved up yet.

> Will do later.

> Regards,

> -Karthick

> From: Alexis de Talhouët <[email protected]>

> Sent: Tuesday, January 23, 2018 8:55:06 AM

> To: Ramanarayanan, Karthick

> Cc: [email protected]; Bainbridge, David

> Subject: Re: [**EXTERNAL**] [onap-discuss] Service distribution error on 
> latest ONAP/OOM

>

> Karthick,

>

> The fix is out: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gerrit.onap.org_r_-23_c_28591_&d=DwIGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=e3d1ehx3DI5AoMgDmi2Fzw&m=89C4aPc8CPL_g-Ru-fSqmRSXMfx_LsOSu2hB004h7DE&s=SjBZPZWfOOQdru4__bB8P3BpTea7c0b1ORpuAhQek8g&e=
>   and has been tested.

> Expect this to be merge on a couple of hours.

>

> Please re-test and confirm it does fix your issue when you have time.

>

> Regards,

> Alexis

>

>> On Jan 22, 2018, at 11:57 AM, Ramanarayanan, Karthick <[email protected]> 
>> wrote:

>>

>> That's great Alexis.

>> Thanks.

>> (also don't be surprised if backend doesn't come up sometimes with no 
>> indicator in the log pods.

>>  Just restart cassandra, elastic search and kibana pod before restarting 
>> backend pod and it would load the user profiles in the sdc-be logs :)

>>

>> Regards,

>> -Karthick

>> From: Alexis de Talhouët <[email protected]>

>> Sent: Monday, January 22, 2018 5:10:26 AM

>> To: Ramanarayanan, Karthick

>> Cc: [email protected]; Bainbridge, David

>> Subject: Re: [**EXTERNAL**] [onap-discuss] Service distribution error on 
>> latest ONAP/OOM

>>

>> Hi Karthick,

>>

>> Yes, I’m aware of this since you mentioned it last week. I reproduced the 
>> issue.

>> Currently implementing a fix for it. Sorry for the regression introduced.

>>

>> See 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.onap.org_browse_OOM-2D608&d=DwIGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=e3d1ehx3DI5AoMgDmi2Fzw&m=89C4aPc8CPL_g-Ru-fSqmRSXMfx_LsOSu2hB004h7DE&s=i88B99FSoWWT0mjgkUg_Bnyafz_o9l6u-nHdowzmoBI&e=
>>   for more details.

>>

>> Thanks,

>> Alexis

>>

>>> On Jan 19, 2018, at 4:21 PM, Ramanarayanan, Karthick <[email protected]> 
>>> wrote:

>>>

>>> Hi Alexis,

>>>  I reverted the oom commit from head to:

>>>

>>> git checkout cb02aa241edd97acb6c5ca744de84313f53e8a5a

>>>

>>> Author: yuryn <[email protected]>

>>> Date:   Thu Dec 21 14:31:21 2017 +0200

>>>

>>>     Fix firefox tab crashes in VNC

>>>

>>>     Change-Id: Ie295257d98ddf32693309535e15c6ad9529f10fc

>>>     Issue-ID: OOM-531

>>>

>>>

>>> Everything works with service creation, vnf and vf creates!

>>> Please note that I am running with dcae disabled.

>>> Something is broken with dcae disabled in the latest.

>>> 100% reproducible with service distribution step through operator taking a 
>>> policy exception mailed earlier.

>>> Have a nice weekend.

>>>

>>> Regards,

>>> -Karthick

>>>

>>>

>>>

>>>

>>> From: Ramanarayanan, Karthick

>>> Sent: Friday, January 19, 2018 8:48:23 AM

>>> To: Alexis de Talhouët

>>> Cc: [email protected]

>>> Subject: Re: [**EXTERNAL**] Re: [onap-discuss] Service distribution error 
>>> on latest ONAP/OOM

>>>

>>> Hi Alexis,

>>>  I did check the policy pod logs before sending the mail.

>>>  I didn't see anything suspicious.

>>>  I initially suspected aai-service dns not getting resolved but you seem to 
>>> have fixed it

>>>  and it was accessible from policy pod.

>>>  Nothing suspicious from any log anywhere.

>>>  I did see that the health check on sdc pods returned all UP except: DE 
>>> component whose health check was down.

>>>  Not sure if its anyway related. Could be benign.

>>>

>>> curl 
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A30206_sdc1_rest_healthCheck&d=DwIGaQ&c=06gGS5mmTNpWnXkc0ACHoA&r=3Q306Mu4iPxbTMD0vasm2o7f6Fs_R4Dsdq4HWP9yOq8&m=NHQdHxVxjhO6-a8PZApd4WJgi4JN2C8wUXk7jloiZ6s&s=b96JYpK8EAO9G6dYRFc9NR_hnsPkH-ltwtr5miLQk04&e=

>>> {

>>>  "sdcVersion": "1.1.0",

>>>  "siteMode": "unknown",

>>>  "componentsInfo": [

>>>    {

>>>      "healthCheckComponent": "BE",

>>>      "healthCheckStatus": "UP",

>>>      "version": "1.1.0",

>>>      "description": "OK"

>>>    },

>>>    {

>>>      "healthCheckComponent": "TITAN",

>>>      "healthCheckStatus": "UP",

>>>      "description": "OK"

>>>    },

>>>    {

>>>      "healthCheckComponent": "DE",

>>>      "healthCheckStatus": "DOWN",

>>>      "description": "U-EB cluster is not available"

>>>    },

>>>    {

>>>      "healthCheckComponent": "CASSANDRA",

>>>      "healthCheckStatus": "UP",

>>>      "description": "OK"

>>>    },

>>>    {

>>>      "healthCheckComponent": "ON_BOARDING",

>>>      "healthCheckStatus": "UP",

>>>      "version": "1.1.0",

>>>      "description": "OK",

>>>      "componentsInfo": [

>>>        {

>>>          "healthCheckComponent": "ZU",

>>>          "healthCheckStatus": "UP",

>>>          "version": "0.2.0",

>>>          "description": "OK"

>>>        },

>>>        {

>>>          "healthCheckComponent": "BE",

>>>          "healthCheckStatus": "UP",

>>>          "version": "1.1.0",

>>>          "description": "OK"

>>>        },

>>>        {

>>>          "healthCheckComponent": "CAS",

>>>          "healthCheckStatus": "UP",

>>>          "version": "2.1.17",

>>>          "description": "OK"

>>>        },

>>>        {

>>>          "healthCheckComponent": "FE",

>>>          "healthCheckStatus": "UP",

>>>          "version": "1.1.0",

>>>          "description": "OK"

>>>        }

>>>      ]

>>>    },

>>>    {

>>>      "healthCheckComponent": "FE",

>>>      "healthCheckStatus": "UP",

>>>      "version": "1.1.0",

>>>      "description": "OK"

>>>    }

>>>  ]

>>>

>>>

>>> On some occasions backend doesn't come up even though pods are running.

>>> (seen on other nodes running onap and was there even without your changes. 
>>> Logs indicated nothing.

>>> But if I restart the sdc pods for cassandra, elastic search and kibana 
>>> before backend restart, backend starts responding and ends up creating the 
>>> user profile entries for the various user roles for onap as seen in logs. 
>>> But this is unrelated to this service distribution error as backend is up.)

>>> )

>>>

>>>

>>> Regards,

>>> -Karthick

>>>

>>>

>>>

>>> From: Alexis de Talhouët <[email protected]>

>>> Sent: Friday, January 19, 2018 4:54 AM

>>> To: Ramanarayanan, Karthick

>>> Cc: [email protected]

>>> Subject: [**EXTERNAL**] Re: [onap-discuss] Service distribution error on 
>>> latest ONAP/OOM

>>>

>>> Hi,

>>>

>>> Could you look at the log of Policy for errors, for that you need to go in 
>>> the pod themselves, under /var/log/onap.

>>> You could do the same for SDC container (backend).

>>> The thing that could have affect Policy is the fact we removed the 
>>> persisted data of mariadb, because it was bogus 
>>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__gerrit.onap.org_r_-23_c_27521_&d=DwIGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=e3d1ehx3DI5AoMgDmi2Fzw&m=89C4aPc8CPL_g-Ru-fSqmRSXMfx_LsOSu2hB004h7DE&s=3UEFa-0-8eZTPtKcHmUrhH9iRmk-T6AUnuyoLFfDRzw&e=
>>>  ). But I doubt it does explain your issue.

>>> Beside that, nothing having a potential disruptive effect happen to policy.

>>> The DCAE work was well tested before it got merged. I’ll re-test sometime 
>>> today or early next week to make sure nothing has slept through the crack.

>>>

>>> Thanks,

>>> Alexis

>>>

>>>> On Jan 18, 2018, at 11:44 PM, Ramanarayanan, Karthick <[email protected]> 
>>>> wrote:

>>>>

>>>> Hi,

>>>>  Trying to distribute a demo firewall service instance on a kubernetes 
>>>> host running ONAP, I am seeing a new policy exception error on the latest 
>>>> oom on amsterdam.

>>>> (dcae deploy is false and disableDcae is true)

>>>>

>>>> Error code: POL5000

>>>> Status code: 500

>>>> Internal Server Error. Please try again later.

>>>>

>>>> All pods are up. Health check seems to be fine on all pods.

>>>> k8s pod logs don't seem to reveal anything and this happens consistently 
>>>> whenever I try to distribute the service as an operator.

>>>>

>>>> It was working fine last week.

>>>> Even yesterday I didn't get this error though I got a different one 
>>>> related createVnfInfra notify exception on SO vnf create workflow step but 
>>>> that was a different failure than this.

>>>>

>>>> After the dcae config changes got merged, this service distribution error 
>>>> seems to have popped up. (dcae is disabled for my setup)

>>>>

>>>> What am I missing?

>>>>

>>>> Thanks,

>>>> -Karthick

>>>> _______________________________________________

>>>> onap-discuss mailing list

>>>> [email protected]

>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.onap.org_mailman_listinfo_onap-2Ddiscuss&d=DwIGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=e3d1ehx3DI5AoMgDmi2Fzw&m=89C4aPc8CPL_g-Ru-fSqmRSXMfx_LsOSu2hB004h7DE&s=03FQy4MzE1paNVhm2rrHAfxpJo9ixHiTQO0qUIqO0Hs&e=



_______________________________________________

onap-discuss mailing list

[email protected]

https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.onap.org_mailman_listinfo_onap-2Ddiscuss&d=DwIGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=e3d1ehx3DI5AoMgDmi2Fzw&m=89C4aPc8CPL_g-Ru-fSqmRSXMfx_LsOSu2hB004h7DE&s=03FQy4MzE1paNVhm2rrHAfxpJo9ixHiTQO0qUIqO0Hs&e=

_______________________________________________
onap-discuss mailing list
[email protected]
https://lists.onap.org/mailman/listinfo/onap-discuss

Re: [onap-discuss] [**EXTERNAL**] Service distribution error on latest ONAP/OOM

Reply via email to

Re: [onap-discuss] [EXTERNAL] Service distribution error on latest ONAP/OOM