Hey Nikos, I've been taken a look at the logs, and unfortunately there
is no much that I found out. I do see a lot of "ERROR: reading from
fd ##". Is that bad? Stupid question: what does "fd" means?
2009-07-20 19:00:20 [7205] [1] ERROR: Error reading from fd 28:
2009-07-20 19:00:20 [7205] [1] ERROR: System error 104: Connection
reset by peer
2009-07-20 19:07:34 [7205] [1] ERROR: Error reading from fd 32:
2009-07-20 19:07:34 [7205] [1] ERROR: System error 104: Connection
reset by peer
2009-07-20 19:09:47 [7205] [1] ERROR: Error reading from fd 32:
2009-07-20 19:09:47 [7205] [1] ERROR: System error 104: Connection
reset by peer
2009-07-20 19:18:59 [7205] [1] ERROR: Error reading from fd 31:
2009-07-20 19:18:59 [7205] [1] ERROR: System error 104: Connection
reset by peer
2009-07-20 19:28:05 [7205] [1] ERROR: Error reading from fd 29:
2009-07-20 19:28:05 [7205] [1] ERROR: System error 104: Connection
reset by peer
2009-07-20 20:37:52 [7205] [1] ERROR: Error reading from fd 30:
2009-07-20 20:37:52 [7205] [1] ERROR: System error 104: Connection
reset by peer
2009-07-21 01:55:43 [7205] [8] ERROR: Couldn't fetch <http://localhost:9090/midcgw/UpmobileSMSHandler?sender=9711078748&receiver=33123&text=2256+81365392151204318&binary=2256+81365392151204318&time=2009-07-21+05:51:32&smsc-id=TELCEL33123_MX&SMS-ID=4889c38e-e730-4bb6-acca-fc48f5283c29&DeliveryValue=-1&DeliveryReportReply=2256+81365392151204318&sendsms-user=default&message-coding=0&message-class-bits=-1&mwi=-1&message-charset=UTF8&udh=&billing=&account=&serviceid=%v&sessionid=%w&meta-data=%3Fsmpp%3F
>
2009-07-20 19:59:12 [7210] [8] ERROR: Error reading from fd 30:
2009-07-20 19:59:12 [7210] [8] ERROR: System error 104: Connection
reset by peer
2009-07-20 19:59:12 [7210] [8] ERROR: Couldn't fetch <http://10.10.20.10:9090/midcgw/dlr?uuid=3a7ceed5-4299-4771-a4f6-e4c9e724a46d&dlr-status=8&dlr-errcode=&dlr-tlvs=®istered_delivery=1
>2009-07-21 01:57:33 [7210] [8] ERROR: Couldn't fetch<http://localhost:9090/midcgw/UpmobileSMSHandler?sender=5527698584&receiver=55202&text=Sexy&binary=Sexy&time=2009-07-21+05:51:36&smsc-id=TELCEL5_MX&SMS-ID=79b4761c-23a4-4ba0-aa65-c2dc913ff35c&DeliveryValue=1&DeliveryReportReply=Sexy&sendsms-user=default&message-coding=0&message-class-bits=-1&mwi=-1&message-charset=UTF-8&udh=&billing=&account=&serviceid=%v&sessionid=%w&meta-data=%3Fsmpp%3F
>2009-07-21 01:57:33 [7210] [8] ERROR: Couldn't fetch <http://localhost:9090/midcgw/UpmobileSMSHandler?sender=%2B7876280600&receiver=%2B55225&text=Picante&binary=Picante+&time=2009-07-21+05:52:41&smsc-id=Centennial_PR&SMS-ID=1efdffdf-e361-4076-a4e6-8e7589fe9f0b&DeliveryValue=-1&DeliveryReportReply=Picante+&sendsms-user=default&message-coding=0&message-class-bits=-1&mwi=-1&message-charset=UTF-8&udh=&billing=&account=&serviceid=%v&sessionid=%w&meta-data=%3Fsmpp%3F
>
2009-07-21 02:01:46 [7215] [8] ERROR: Couldn't fetch <http://localhost:9090/midcgw/UpmobileSMSHandler?sender=6391107103&receiver=33123&text=Melate+2256&binary=Melate+2256&time=2009-07-21+05:57:44&smsc-id=TELCEL33123_MX&SMS-ID=6c398ecb-5d13-428a-9ec5-f7f2ed06d238&DeliveryValue=-1&DeliveryReportReply=Melate+2256&sendsms-user=default&message-coding=0&message-class-bits=-1&mwi=-1&message-charset=UTF-8&udh=&billing=&account=&serviceid=%v&sessionid=%w&meta-data=%3Fsmpp%3F
>
Hi,
I think that Nagios maybe screwing kannel. Destroying HTTP client is
a standard action. When an HTTP request reaches kannel, it creates
an HTTP client, and when it finishes it destroys it to avoid memory
leaks.
You have to figure out what is fd 190. Possibly your smsbox. Send 10
lines +/- from your smsbox log error. There should be more entries
about the failure and the reason. I suspect your server might be
running out of sockets (file descriptors).
BR,
Nikos
----- Original Message -----
From: Marcelo Olivas
To: Nikos Balkanas
Cc: [email protected] ; Tino Cuesta
Sent: Tuesday, July 21, 2009 11:09 PM
Subject: Re: Error 500 in Kannel's HTTP
Nikos, sorry for the confusion. Nagios is just a Linux Monitoring
Application. It checks the status of my connections using the
status.xml from the Kannel's admin module. The peer for the
bearerbox is a Java application running on Tomcat using port 9090.
Both, the Kannel and Tomcat applications are running in the same
server. The weird thing is that I don't see any error in the
Tomcat. At the beginning I thought it was a network hiccup;
however, this has happened more than 3 times now. Below is my
configuration for the BB:
-------------------------------------------------------------------------------------
group = core
admin-port = 13000
smsbox-port = 13001
admin-password = secret
status-password = scret2
log-file = "/opt/kannel/logs/bearerbox.log"
log-level = 0
access-log = "/opt/kannel/logs/access_bearerbox.log"
store-type = spool
store-location = "/opt/kannel/var/spool/bearerbox"
dlr-storage = mysql
black-list = "http://localhost:9090/midcgw/blacklist.txt"
#
# Include the bearerbox DLR storage type.
#
include = "/opt/kannel/etc/module.d/dlr-storage.conf"
#
# The upstream SMSC connection configurations we use.
#
include = "/opt/kannel/etc/smsc.d"
#
# A kludge smsbox group. Bearerbox at least needs to know
# that it should open the smsbox-port by detecing at least
# a smsbox group here.
group = smsbox
-------------------------------------------------------------------------------------
I followed the logs and this is what I noticed:
2009-07-21 02:01:52 [7125] [3] DEBUG: HTTP: Destroying HTTPClient
area 0x8ac08888.
2009-07-21 02:01:52 [7125] [3] DEBUG: HTTP: Destroying HTTPClient
for `xx.xx.xx.16'.
2009-07-21 02:01:52 [7125] [1] DEBUG: HTTP: Destroying HTTPClient
area 0x8ac04680.
2009-07-21 02:01:52 [7125] [1] DEBUG: HTTP: Destroying HTTPClient
for `xx.xx.xx.16'.
2009-07-21 02:01:52 [7125] [1] ERROR: Error writing 418 octets to fd
190:
2009-07-21 02:01:52 [7125] [1] ERROR: System error 32: Broken pipe
2009-07-21 02:01:52 [7125] [76] DEBUG: send_msg: sending msg to box:
<127.0.0.1>
2009-07-21 02:01:52 [7125] [76] DEBUG: boxc_sender: sent message to
<127.0.0.1>
2009-07-21 02:01:52 [7125] [3] DEBUG: HTTP: Destroying HTTPClient
area 0x8ac1b4c8.
2009-07-21 02:01:52 [7125] [3] DEBUG: HTTP: Destroying HTTPClient
for `xx.xx.xx.16'.
2009-07-21 02:01:52 [7125] [3] DEBUG: HTTP: Destroying HTTPClient
area 0x8ac1b5e8.
2009-07-21 02:01:52 [7125] [3] DEBUG: HTTP: Destroying HTTPClient
for `xx.xx.xx.16'.
As I mentioned, Nagios is configured so that every 5 minutes it uses
the admin module to check the status of my connections. Nagios IP
is xx.xx.xx16. What does the "Destroying HTTPClient for xx.xx.xx.
16" means and why is it doing it?
Thanks again guys!!
On Jul 21, 2009, at 3:01 PM, Nikos Balkanas wrote:
Hi,
I don't know Nagios. I am assuming from what you say, that fd 190
is your Nagios connection. Broken pipe means that bb's peer
(Nagios?) has hanged the connection without sending a FIN
(abnormally, network issue?).
What is your localhost 9090? Seems not to be responding either.
First you get the error in the smsbox, and then bb error follows.
How about some configuration?
BR,
Nikos
----- Original Message -----
From: Marcelo Olivas
To: [email protected]
Cc: Tino Cuesta
Sent: Tuesday, July 21, 2009 7:25 PM
Subject: Error 500 in Kannel's HTTP
Hi gurus!! I'm getting the following error in bearerbox:
2009-07-21 02:01:52 [7125] [1] ERROR: Error writing 418 octets to
fd 190:
2009-07-21 02:01:52 [7125] [1] ERROR: System error 32: Broken pipe
The error gives me an alert in my Nagios saying that all my
connections (SMPP) are down. I'm getting this error while using
the status admin XML page.
With the smsbox I'm getting:
2009-07-21 02:01:35 [7205] [8] ERROR: Couldn't fetch <http://localhost:9090/midc
gw/UpmobileSMSHandler?
sender=xxxxx&receiver=xxx&text=text&time=2009-07-21+05:56:37&smsc-
id=TELCEL&SMS-I
D=0498316d-046a-4a4b-
bf3c-92edf6b74cce&DeliveryValue=-1&DeliveryReportReply=Mv+s
e%3For+toca+el+corazon+d+mi+bordito+y+regresa+sus+pasos+a+nuestro
+hogar+para+sie
mpre%2C+gracias&sendsms-user=default&message-coding=0&message-class-
bits=-1&mwi=
-1&message-charset=UTF-8&udh=&billing=&account=&serviceid=
%v&sessionid=%w&meta-d
ata=%3Fsmpp%3F>
The error is always happening at this time. During this time I get
the alert in my Nagios server, and in about 5-10 minutes everything
seems fine.
Can you help me??
Thanks,
Marcelo