Hello,

I upgraded to 2.11.20161023-labs-edition a few months back to get rid of some 
mod_fcgid errors by using lmd. That has helped a lot, but I am still seeing 
errors but not nearly as bad now.

[Wed Feb 01 04:45:42.270564 2017] [core:error] [pid 26829] [client 
127.0.0.1:48720] End of script output before headers: fcgid_env.sh
[Wed Feb 01 04:50:41.266928 2017] [fcgid:warn] [pid 26828] [client 
127.0.0.1:55088] mod_fcgid: read data timeout in 40 seconds

We are up 10 Nagios servers with about 210,000 host and service checks.

Also we are starting to play with Nagvis and tried to use LMD for our socket 
but there seems to be a compatibility issue.

Here is what we found.


So, it looks like the PHP change to strip the KeepAlive header isn't going to 
work. This was our first stab at fixing the issue but there are also columns 
specified by NagVis that are present in Livestatus, but not in lmd.



Here's the actual request NagVis is making (without edits):
GET hosts
Columns: state plugin_output alias display_name address notes last_check 
next_check state_type current_attempt max_check_attempts last_state_change 
last_hard_state_change perf_data acknowledged scheduled_downtime_depth 
has_been_checked name check_command custom_variable_names 
custom_variable_values staleness
Filter: name = 4457-TX-RTR
OutputFormat: json
KeepAlive: on
ResponseHeader: fixed16

Now the results with different configurations based on that input...

Original query:
OMD[fss]:~$ unixcat tmp/thruk/lmd/live.sock < nagvis_def_query.txt
bad request: unrecognized header KeepAlive: on



Removing KeepAlive header:
OMD[fss]:~$ unixcat tmp/thruk/lmd/live.sock < nagvis_def_query.txt
400          49
bad request: table hosts has no column staleness



Removing staleness from columns:
OMD[fss]:~$ unixcat tmp/thruk/lmd/live.sock < nagvis_def_query.txt
200         621
[[0,"OK - 10.21.118.1: rta 0.558ms, lost 
0%","GC-WatsonWise-TX-4457-RTR","4457-TX-RTR","10.21.118.1","",1.485967184e+09,1.485967784e+09,1,1,10,1.484687044e+09,1.484342519e+09,"rta=0.558ms;3000.000;5000.000;0;
 pl=0%;80;100;; rtmax=0.783ms;;;; 
rtmin=0.492ms;;;;",0,0,1,"4457-TX-RTR","check-host-alive",[],[]]
,
[0,"OK - 10.21.118.1: rta 67.222ms, lost 
0%","GC-WatsonWise-TX-4457-RTR","4457-TX-RTR","10.21.118.1","",1.485966678e+09,1.485967278e+09,1,1,10,1.485547247e+09,1.485547247e+09,"rta=67.222ms;3000.000;5000.000;0;
 pl=0%;80;100;; rtmax=69.266ms;;;; 
rtmin=66.296ms;;;;",0,0,1,"4457-TX-RTR","check-host-alive",[],[]]
]

So, it looks like what needs to happen for compatibility with lmd is to both 
remove the KeepAlive header, and modify requests so that the staleness column 
is not requested.

Does anyone previously changed this or modified this or should we just continue 
to go to the live status on the Nagios servers for Nagvis? Or are we missing 
something ?

We love the product and hope to expand this out to about 2500 Nagios servers to 
feed Thruk in OMD within the next 6 months.

Any insight would be greatly appreciated.

Tom





_______________________________________________
omd-users mailing list
[email protected]
http://lists.mathias-kettner.de/mailman/listinfo/omd-users

Reply via email to