Hi,
I'm doing some tests with DLR's (mysql storage) and I've come across a weird
problem.

I'm using 2 Kannel servers, each one having an SMPP with a carrier. Messages
may come and go over either link, so maybe an MT goes from server #1 and a
DLR comes back on server #2. To solve that issue, I'm using a central DB and
the mysql storage for DLR's.

The problem is, sometimes (about 1 in 5-6 messages) the DLR arrives
*before*the row is inserted, so kannel ignores it and the record then
remains
untouched forever. This usually happens when the MT and the DLR are
processed on different servers, though most of the time it just works (even
when the MT and DLR are processed on different servers, the DLR is found,
processed and deleted).

Here's an example:

*Server #1:*

2009-04-29 16:44:45 [14318] [7] DEBUG: DLR[mysql]: Adding DLR smsc=my-smsc,
ts=5073a07e, src=OOOO, dst=XXXXXXXXXXX, mask=31, boxc=

2009-04-29 16:44:45 [14318] [7] DEBUG: sql: INSERT INTO dlr (smsc, ts,
source, destination, service, url, mask, boxc, status) VALUES ('my-smsc',
'5073a07e', 'OOOO', 'XXXXXXXXXXX', 'kannel', '
http://my-host-name/dlr?id=f59d4249-65d8-4969-a2d9-636c881b9de7&code=%d&scode=%B',
'31', '', '0');


*Server #2:*

2009-04-29 16:44:45 [8395] [9] DEBUG: DLR[mysql]: Looking for DLR
smsc=my-smsc, ts=5073a07e, dst=XXXXXXXXXXX, type=2

2009-04-29 16:44:45 [8395] [9] DEBUG: sql: SELECT mask, service, url,
source, destination, boxc FROM dlr WHERE smsc='my-smsc' AND ts='5073a07e';

2009-04-29 16:44:45 [8395] [9] ERROR: SMPP[my-smsc]: got DLR but could not
find message or was not interested in it id<5073a07e> dst<XXXXXXXXXXX>


I think that's because there's a possible race condition inherent on SQL
latency: The dlr only could be inserted after the submit_sm_resp is
received, but perhaps the smsc starts delivering the message on a separate
thread right after receiving the submit_sm. Add some SQL latency and there's
a possible race condition:


1. Kannel sends a submit_sm

2. SMSC starts delivering the message on another thread

3. SMSC starts delivering the submit_sm_resp

4. The SMSC ends delivering the DLR.

5. Kannel receives the DLR and searches for it on the DB. Not found - DLR is
ignored.

6. The SMSC ends delivering the submit_sm_resp

7. Kannel parses the receipted_message_id and inserts the DLR.

8. The DLR row is not searched again and remains forever on the queue.


A possible solution would be to implement a (configurable/disabled by
default) retry mechanism for missing DLR's. For example, retrying one or two
times after a few milliseconds if the dlr is not found.


Opinions? Insights?


Regards,


Alejandro

Reply via email to