Re: [Users] Contrail 3.0 analytics-api errors

Raj Reddy Wed, 20 Apr 2016 06:56:12 -0700

Inline..

On Apr 20, 2016, at 3:35 AM, Jakub Pavlik 
<[email protected]<mailto:[email protected]>> wrote:


Hi Raj,

I am running latest build of 3.0 from yesterday. I have everything in active 
state now, but still every 15 seconds in 
/var/log/contrail/contrail-analytics-api.log:

04/20/2016 12:15:50 PM [contrail-analytics-api]: Exception TimeoutError in uve 
stream proc. Arguments:
('Timeout reading from socket',) : traceback Traceback (most recent call last):
 File "/usr/lib/python2.7/dist-packages/opserver/partition_handler.py", line 
354, in _run
   for message in pb.listen():
 File "/usr/lib/python2.7/dist-packages/redis/client.py", line 2215, in listen
   response = self.handle_message(self.parse_response(block=True))
 File "/usr/lib/python2.7/dist-packages/redis/client.py", line 2150, in 
parse_response
   return self._execute(connection, connection.read_response)
 File "/usr/lib/python2.7/dist-packages/redis/client.py", line 2132, in _execute
   return command(*args)
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 569, in 
read_response
   response = self._parser.read_response()
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 224, in 
read_response
   response = self._buffer.readline()
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 162, in 
readline
   self._read_from_socket()
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 133, in 
_read_from_socket
   raise TimeoutError("Timeout reading from socket")
TimeoutError: Timeout reading from socket

I am not sure what’s happening here, did you check redis-server logs, is it 
restart’ing?


I have also all config,analytics and database nodes in alert mode at GUI with 
message "Analytics Node config missing or incorrect”

expectation is each of the nodes are added to config and GUI checks for the same
see 
https://github.com/Juniper/contrail-controller/blob/master/src/config/utils/provision_config_node.py


Each node in cluster has following uve topics in kafka:

/usr/share/kafka/bin/kafka-topics.sh --zookeeper 172.16.20.101:2181 --list
-uve-0
-uve-1
-uve-10
-uve-11
-uve-12
-uve-13
-uve-14
-uve-2
-uve-3
-uve-4
-uve-5
-uve-6
-uve-7
-uve-8
-uve-9

Some of alarm-gen have less partitions than another. I have 3 controllers:

       {
           "admin_state": "up",
           "ep_id": "openstack-stg-ctl03",
           "ep_type": "AlarmGenerator",
           "hbcount": 0,
           "heartbeat": 1461148260,
           "in_use": 0,
           "info": {
               "instance-id": "0",
               "ip-address": "172.16.20.103",
               "partitions": "{\"1\": 1461141724586667, \"4\": 
1461141724643808, \"6\": 1461141724644764, \"7\": 1461141724645688, \"9\": 
1461141724646408, \"11\": 1461141724647377, \"12\": 1461141724651648, \"13\": 
1461141724656747, \"14\": 1461141724662476}",
               "redis-port": "6379"
           },
           "oper_state": "up",
           "oper_state_msg": "",
           "prov_state": "up",
           "remote": "172.16.20.103",
           "sequence": "1461148260openstack-stg-ctl02",
           "service_id": "openstack-stg-ctl03:AlarmGenerator",
           "service_type": "AlarmGenerator",
           "status": "up",
           "ts_created": 1461148260,
           "ts_use": 49217,
           "version": "1.0"
       },


Is it correct that there are twice values with FQDN 
(openstack-stg-ctl01.openstack.tcpcloud.eu<http://openstack-stg-ctl01.openstack.tcpcloud.eu/>)
 and hostname (openstack-stg-ctl01)?

This seems incorrect, may be a restart of AlarmGen will clean it up.. the 
hostname may have been changed after alarmgen started up.


       {
           "admin_state": "up",
           "ep_id": 
"openstack-stg-ctl01.openstack.tcpcloud.eu<http://openstack-stg-ctl01.openstack.tcpcloud.eu/>",
           "ep_type": "AlarmGenerator",
           "hbcount": 0,
           "heartbeat": 1460968900,
           "in_use": 0,
           "info": {
               "instance-id": "0",
               "ip-address": "172.16.20.101",
               "partitions": "{\"0\": 1460965891588811, \"10\": 
1460965891603645, \"3\": 1460965891589092, \"5\": 1460965891599416}",
               "redis-port": "6379"
           },


       {
           "admin_state": "up",
           "ep_id": "openstack-stg-ctl01",
           "ep_type": "AlarmGenerator",
           "hbcount": 0,
           "heartbeat": 1461148265,
           "in_use": 0,
           "info": {
               "instance-id": "0",
               "ip-address": "172.16.20.101",
               "partitions": "{\"0\": 1461147595465749, \"10\": 
1461147595474188, \"3\": 1461147595478460, \"5\": 1461147595479857}",
               "redis-port": "6379"
           },


Thanks,

Jakub

On 19.4.2016 22:30, Raj Reddy wrote:
There were some fixes in this area, what version of software are you using?
Can you verify if all alarm-gen are publishing partition numbers from 1 to 15 
to discovery at <ip>:5998?
contrail-analytics-api reads this from discovery and complains if it’s 
incomplete.

-Raj

On Apr 19, 2016, at 4:38 AM, Jakub Pavlik 
<[email protected]<mailto:[email protected]>> wrote:

Hello,

does anybody hit this issue with Contrail 3.0 and analytics-api? It shows even 
if all service are active:

== Contrail Analytics ==
supervisor-analytics:         active
contrail-alarm-gen            active
contrail-analytics-api        initializing 
(UvePartitions:UVE-Aggregation[Partitions:13] connection down)
contrail-analytics-nodemgr    active
contrail-collector            active
contrail-query-engine         active
contrail-snmp-collector       active
contrail-topology             active

/var/log/contrail/contrail-analytics.api.log

04/19/2016 01:35:55 PM [contrail-analytics-api]: Exception TimeoutError in uve 
stream proc. Arguments:
('Timeout reading from socket',) : traceback Traceback (most recent call last):
 File "/usr/lib/python2.7/dist-packages/opserver/partition_handler.py", line 
354, in _run
   for message in pb.listen():
 File "/usr/lib/python2.7/dist-packages/redis/client.py", line 2215, in listen
   response = self.handle_message(self.parse_response(block=True))
 File "/usr/lib/python2.7/dist-packages/redis/client.py", line 2150, in 
parse_response
   return self._execute(connection, connection.read_response)
 File "/usr/lib/python2.7/dist-packages/redis/client.py", line 2132, in _execute
   return command(*args)
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 569, in 
read_response
   response = self._parser.read_response()
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 224, in 
read_response
   response = self._buffer.readline()
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 162, in 
readline
   self._read_from_socket()
 File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 133, in 
_read_from_socket
   raise TimeoutError("Timeout reading from socket")
TimeoutError: Timeout reading from socket

Redis server seems to be working fine. I see this problem in two independent 
deployments.

Thanks,

jakub


_______________________________________________
Users mailing list
[email protected]<mailto:[email protected]>
http://lists.opencontrail.org/mailman/listinfo/users_lists.opencontrail.org

--
Jakub Pavlik
CTO

[tcp ◕ cloud]

+420 602 177 027
[email protected]<mailto:[email protected]>

tcp cloud a.s.
Thamova 16
186 00 Praha 8 - Karlin
Czech republic
http://tcpcloud.eu<http://tcpcloud.eu/>
http://opentcpcloud.org<http://opentcpcloud.org/>

_______________________________________________
Users mailing list
[email protected]
http://lists.opencontrail.org/mailman/listinfo/users_lists.opencontrail.org

Re: [Users] Contrail 3.0 analytics-api errors

Reply via email to