Maybe a more robust mechanism should be in place for DLRs. On and off various 
tickets have surfaced about it. Consider that the 2nd server that receives the 
DLR looses its connection to the DB. It should store somewhere the DLRs until 
the connectivity is recovered.

BR,
Nikos
  ----- Original Message ----- 
  From: Nikos Balkanas 
  To: Alejandro Guerrieri ; [email protected] 
  Sent: Thursday, April 30, 2009 12:44 AM
  Subject: Re: Possible race condition with dlr-mysql


  Definitely. DLRs are not synchronous and therefore a little extra delay 
wouldn't hurt. To make it better I would suggest the delay only in the case of 
DB storage for DLRs.

  Nikos
    ----- Original Message ----- 
    From: Alejandro Guerrieri 
    To: [email protected] 
    Sent: Thursday, April 30, 2009 12:28 AM
    Subject: Possible race condition with dlr-mysql


    Hi, 


    I'm doing some tests with DLR's (mysql storage) and I've come across a 
weird problem.


    I'm using 2 Kannel servers, each one having an SMPP with a carrier. 
Messages may come and go over either link, so maybe an MT goes from server #1 
and a DLR comes back on server #2. To solve that issue, I'm using a central DB 
and the mysql storage for DLR's.


    The problem is, sometimes (about 1 in 5-6 messages) the DLR arrives before 
the row is inserted, so kannel ignores it and the record then remains untouched 
forever. This usually happens when the MT and the DLR are processed on 
different servers, though most of the time it just works (even when the MT and 
DLR are processed on different servers, the DLR is found, processed and 
deleted).


    Here's an example:


    Server #1:

    2009-04-29 16:44:45 [14318] [7] DEBUG: DLR[mysql]: Adding DLR smsc=my-smsc, 
ts=5073a07e, src=OOOO, dst=XXXXXXXXXXX, mask=31, boxc=

    2009-04-29 16:44:45 [14318] [7] DEBUG: sql: INSERT INTO dlr (smsc, ts, 
source, destination, service, url, mask, boxc, status) VALUES ('my-smsc', 
'5073a07e', 'OOOO', 'XXXXXXXXXXX', 'kannel', 
'http://my-host-name/dlr?id=f59d4249-65d8-4969-a2d9-636c881b9de7&code=%d&scode=%B',
 '31', '', '0');




    Server #2:

    2009-04-29 16:44:45 [8395] [9] DEBUG: DLR[mysql]: Looking for DLR 
smsc=my-smsc, ts=5073a07e, dst=XXXXXXXXXXX, type=2

    2009-04-29 16:44:45 [8395] [9] DEBUG: sql: SELECT mask, service, url, 
source, destination, boxc FROM dlr WHERE smsc='my-smsc' AND ts='5073a07e';

    2009-04-29 16:44:45 [8395] [9] ERROR: SMPP[my-smsc]: got DLR but could not 
find message or was not interested in it id<5073a07e> dst<XXXXXXXXXXX>




    I think that's because there's a possible race condition inherent on SQL 
latency: The dlr only could be inserted after the submit_sm_resp is received, 
but perhaps the smsc starts delivering the message on a separate thread right 
after receiving the submit_sm. Add some SQL latency and there's a possible race 
condition:




    1. Kannel sends a submit_sm

    2. SMSC starts delivering the message on another thread

    3. SMSC starts delivering the submit_sm_resp

    4. The SMSC ends delivering the DLR.

    5. Kannel receives the DLR and searches for it on the DB. Not found - DLR 
is ignored.

    6. The SMSC ends delivering the submit_sm_resp

    7. Kannel parses the receipted_message_id and inserts the DLR.

    8. The DLR row is not searched again and remains forever on the queue.




    A possible solution would be to implement a (configurable/disabled by 
default) retry mechanism for missing DLR's. For example, retrying one or two 
times after a few milliseconds if the dlr is not found.




    Opinions? Insights?




    Regards,




    Alejandro



Reply via email to