Re: Inconsistent behavior around encoding and SMPP

spameden Sun, 01 Apr 2012 16:24:06 -0700

Correct, sir. Also had this problem when was testing out kannel's
functionality for non-english symbols. (i.e. if you don't set charset=utf8
you'd get bollocks on your phone instead).


Would be good if this moment could be covered in kannel's user guide.

About coding=3, yes you're right coding=3 stands for LATIN1 in SMPP v3.4
documentation. But coding=0 defines "SMSC Default Alphabet".

So the question is "is LATIN1 encoding the same as 'SMSC Default Alphabet'"
? And what is "SMSC Default Alphabet" actually ? (Most likely its a LATIN1
aka ISO8859-1 encoding).

Reading from the google:

"Depending on the chosen ESME data coding the short message text data is
sent from the SMSC to the mobile in one of the following ways:

    Transparently
    Mapped to the default GSM alphabet

When text is sent from ESME to SMSC in USC2 coding the data will be
transparently sent to the mobile. When the text is coded in LATIN-1 or the
SMSC Default Alphabet, mapping will be performed by the SMSC to the GSM
Default Alphabet before sending the text to the mobile. As the GSM Default
Alphabet is 7-bit coded and uses other codes for some characters (and in
some cases does not even provide a certain character), this implies that
during the mapping process not every character can be mapped one-to-one."

2012/4/2 chad selph <[email protected]>

> That sentence was about the charset parameter (not coding).  The charset
> parameter has no effect on data_coding integer.
>
> Only the coding can effect it, but its effects are quite limited.  SMPP
> has several encodings in the spec (listed in my first email) but coding
> 0,1,2 can only set data_coding to 0, 4, 8 respectively.  I was interested
> in particular in Latin-1 which is data_coding 3 in the SMPP spec.  My
> current understanding is that you can't sent data_coding=3 over SMPP: is
> this the case or have I missed something?
>
> I never said that coding=2 didn't set data_coding to 8; I just thought
> that the UTF-8 bytes were sent directly instead of being convert into
> UCS-2.  However, now I realize I need to be passing charset=utf8 explicitly
> (which I mistakingly thought was default; it's only default when coding=0)
> in order for the re-encoding to happen.  Sorry about asking multiple
> related questions in the same message, I should have been more clear about
> their separation.
>
>
> On Sun, Apr 1, 2012 at 3:41 PM, spameden <[email protected]> wrote:
>
>> I think Chad was referring to "But the important thing is that it has no
>> relevance to the data_coding that gets sent over SMPP" to the coding
>> specified in the SMPP message... I've just checked in kannel smpp logs it
>> sets data_coding: 8 = 0x00000008 when coding=2 specified.
>>
>>
>> 2012/4/2 Alexander Malysh <[email protected]>
>>
>>> Then I don't understand what should be the issue here :-) ?
>>>
>>> Thanks,
>>> Alex
>>>
>>> Am 01.04.2012 um 23:15 schrieb spameden:
>>>
>>> Exactly what I've said :)
>>>
>>> If your source text is in utf8 you need to specify charset=utf8 and
>>> coding=2.
>>>
>>> 2012/4/2 Alexander Malysh <[email protected]>
>>>
>>>> Hi,
>>>>
>>>> cyrillic can only be send with ucs2 therefore coding=2.
>>>>
>>>> Kannel behavior for coding=2 and 3 is simple: don't touch it it's
>>>> binary and up to user to encode it BUT
>>>> if you need that kannel converts some charset to ucs2 for you then just
>>>> use two params:
>>>>  charset=YOUR_CHARSET
>>>> coding=2
>>>>
>>>> Then kannel will do it for you.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>> Am 31.03.2012 um 00:45 schrieb chad selph:
>>>>
>>>> I understand that coding=2 stands for UCS-2 but the problem I'm
>>>> pointing out is that it doesn't actually re-encode the UTF8 bytes into
>>>> actual UCS-2 bytes.  This is inconsistent because it will convert utf8 to
>>>> GSM, or to Latin-1 (if the alt-charset is set to Latin1).
>>>>
>>>> As far as the "charset" parameter: from my understand of the docs, it's
>>>> actually irrelevant to the SMPP stuff, this is just for you to tell smsbox
>>>> which percent encoding your text is in (URLs only support ascii).  It
>>>> defaults to UTF-8 in the newer versions and this is what prefer to use.
>>>>  But the important thing is that it has no relevance to the data_coding
>>>> that gets sent over SMPP.
>>>>
>>>>
>>>> On Fri, Mar 30, 2012 at 3:20 PM, spameden <[email protected]> wrote:
>>>>
>>>>> utf8 + coding=0 never worked for me for cyrillic text messages.
>>>>>
>>>>> the only combination is coding=2 & charset=utf8, otherwise I'm getting
>>>>> bollocks on mobile screen.
>>>>>
>>>>> according to the kannel's documentation, coding is:
>>>>>
>>>>> coding number
>>>>> Optional. Sets the coding
>>>>> scheme bits in DCS field.
>>>>> Accepts values 0 to 2, for 7bit,
>>>>> 8bit or UCS-2. If unset, defaults
>>>>> to 7 bits unless a udh is defined,
>>>>> which sets coding to 8bits.
>>>>>
>>>>> so coding=2 stands for UCS-2 message.
>>>>>
>>>>>
>>>>> 2012/3/31 chad selph <[email protected]>
>>>>>
>>>>>> I'm trying to figure out how to send different data encodings from
>>>>>> Kannel 1.5.0 over SMPP.  The SMPP Spec lists the following options for
>>>>>> data_coding field:
>>>>>>
>>>>>> 0 0 0 0 0 0 0 0 SMSC Default Alphabet
>>>>>> 0 0 0 0 0 0 0 1 IA5(CCITTT.50)/ASCII(ANSIX3.4)
>>>>>> 0 0 0 0 0 0 1 0 Octet unspecified (8-bit binary)
>>>>>> 0 0 0 0 0 0 1 1 Latin1(ISO-8859-1)
>>>>>> 0 0 0 0 0 1 0 0 Octet unspecified (8-bit binary)
>>>>>> 0 0 0 0 0 1 0 1 JIS(X0208-1990)
>>>>>> 0 0 0 0 0 1 1 0 Cyrllic(ISO-8859-5)
>>>>>> 0 0 0 0 0 1 1 1 Latin/Hebrew (ISO-8859-8)
>>>>>> 0 0 0 0 1 0 0 0 UCS2(ISO/IEC-10646)
>>>>>> ... and some others.
>>>>>>
>>>>>> To initiate MT messages, we're using the sendsms http interface on
>>>>>> smsbox (the one here:
>>>>>> http://www.kannel.org/download/1.5.0/userguide-1.5.0/userguide.html#AEN4623).
>>>>>>   It looks like the only relevant parameter into the sendsms is the
>>>>>> "coding" parameter, which can only be 0, 1, or 2.  "0" causes data_coding
>>>>>> 0, 1 causes 4, and 2 causes 8.  I don't see a way to set data_coding to 
>>>>>> 3,
>>>>>> for example, in order to do Latin-1.
>>>>>>
>>>>>> Another thing is that only 0 causes the message text to get encoded
>>>>>> from UTF-8 (input encoding from http) into the correct encoding.  For
>>>>>> example, sending the UTF-8 data with coding=2 does not re-encode the
>>>>>> message into USC-2, but just sends your UTF-8 bytes as if they were UCS-2
>>>>>> but sending utf8 data with coding=0 does re-encode them into GSM.
>>>>>>
>>>>>> These things seem to me to be incorrect behavior, however given the
>>>>>> wide use of kannel I figured I should make sure I'm not missing something
>>>>>> obvious before I draft a patch to attempt to fix them.  Am I missing
>>>>>> something?
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Inconsistent behavior around encoding and SMPP

Reply via email to