Yes, true.
Back to the retry/delay for dlr's, I think for this particular problem at
least, adding a configurable sleep() on dlr_find would just do the trick.

A more ellaborate solution would imply retrying the missing dlr's, but I'm
not sure if it's worth it: the code would be more complex and would double
the requests on the missing records (though save a few milliseconds on all
dlr's).

Opinions? Stipe? Alex? Would something like this make it into the main tree,
or shall I keep it for my personal list of dirty hacks? ;)

Regards,

Alejandro

2009/4/29 Nikos Balkanas <[email protected]>

>  Maybe a more robust mechanism should be in place for DLRs. On and off
> various tickets have surfaced about it. Consider that the 2nd server that
> receives the DLR looses its connection to the DB. It should store somewhere
> the DLRs until the connectivity is recovered.
>
> BR,
> Nikos
>
> ----- Original Message -----
> *From:* Nikos Balkanas <[email protected]>
> *To:* Alejandro Guerrieri <[email protected]> ;
> [email protected]
> *Sent:* Thursday, April 30, 2009 12:44 AM
> *Subject:* Re: Possible race condition with dlr-mysql
>
> Definitely. DLRs are not synchronous and therefore a little extra delay
> wouldn't hurt. To make it better I would suggest the delay only in the case
> of DB storage for DLRs.
>
> Nikos
>
> ----- Original Message -----
> *From:* Alejandro Guerrieri <[email protected]>
> *To:* [email protected]
> *Sent:* Thursday, April 30, 2009 12:28 AM
> *Subject:* Possible race condition with dlr-mysql
>
> Hi,
> I'm doing some tests with DLR's (mysql storage) and I've come across a
> weird problem.
>
> I'm using 2 Kannel servers, each one having an SMPP with a carrier.
> Messages may come and go over either link, so maybe an MT goes from server
> #1 and a DLR comes back on server #2. To solve that issue, I'm using a
> central DB and the mysql storage for DLR's.
>
> The problem is, sometimes (about 1 in 5-6 messages) the DLR arrives *
> before* the row is inserted, so kannel ignores it and the record then
> remains untouched forever. This usually happens when the MT and the DLR are
> processed on different servers, though most of the time it just works (even
> when the MT and DLR are processed on different servers, the DLR is found,
> processed and deleted).
>
> Here's an example:
>
>  *Server #1:*
>
> 2009-04-29 16:44:45 [14318] [7] DEBUG: DLR[mysql]: Adding DLR smsc=my-smsc,
> ts=5073a07e, src=OOOO, dst=XXXXXXXXXXX, mask=31, boxc=
>
> 2009-04-29 16:44:45 [14318] [7] DEBUG: sql: INSERT INTO dlr (smsc, ts,
> source, destination, service, url, mask, boxc, status) VALUES ('my-smsc',
> '5073a07e', 'OOOO', 'XXXXXXXXXXX', 'kannel', '
> http://my-host-name/dlr?id=f59d4249-65d8-4969-a2d9-636c881b9de7&code=%d&scode=%B',
> '31', '', '0');
>
>
> *Server #2:*
>
> 2009-04-29 16:44:45 [8395] [9] DEBUG: DLR[mysql]: Looking for DLR
> smsc=my-smsc, ts=5073a07e, dst=XXXXXXXXXXX, type=2
>
> 2009-04-29 16:44:45 [8395] [9] DEBUG: sql: SELECT mask, service, url,
> source, destination, boxc FROM dlr WHERE smsc='my-smsc' AND ts='5073a07e';
>
> 2009-04-29 16:44:45 [8395] [9] ERROR: SMPP[my-smsc]: got DLR but could not
> find message or was not interested in it id<5073a07e> dst<XXXXXXXXXXX>
>
>
> I think that's because there's a possible race condition inherent on SQL
> latency: The dlr only could be inserted after the submit_sm_resp is
> received, but perhaps the smsc starts delivering the message on a separate
> thread right after receiving the submit_sm. Add some SQL latency and there's
> a possible race condition:
>
>
> 1. Kannel sends a submit_sm
>
> 2. SMSC starts delivering the message on another thread
>
> 3. SMSC starts delivering the submit_sm_resp
>
> 4. The SMSC ends delivering the DLR.
>
> 5. Kannel receives the DLR and searches for it on the DB. Not found - DLR
> is ignored.
>
> 6. The SMSC ends delivering the submit_sm_resp
>
> 7. Kannel parses the receipted_message_id and inserts the DLR.
>
> 8. The DLR row is not searched again and remains forever on the queue.
>
>
> A possible solution would be to implement a (configurable/disabled by
> default) retry mechanism for missing DLR's. For example, retrying one or two
> times after a few milliseconds if the dlr is not found.
>
>
> Opinions? Insights?
>
>
> Regards,
>
>
> Alejandro
>
>
>

Reply via email to