That sentence was about the charset parameter (not coding).  The charset
parameter has no effect on data_coding integer.

Only the coding can effect it, but its effects are quite limited.  SMPP has
several encodings in the spec (listed in my first email) but coding 0,1,2
can only set data_coding to 0, 4, 8 respectively.  I was interested in
particular in Latin-1 which is data_coding 3 in the SMPP spec.  My current
understanding is that you can't sent data_coding=3 over SMPP: is this the
case or have I missed something?

I never said that coding=2 didn't set data_coding to 8; I just thought that
the UTF-8 bytes were sent directly instead of being convert into UCS-2.
 However, now I realize I need to be passing charset=utf8 explicitly (which
I mistakingly thought was default; it's only default when coding=0) in
order for the re-encoding to happen.  Sorry about asking multiple related
questions in the same message, I should have been more clear about their
separation.

On Sun, Apr 1, 2012 at 3:41 PM, spameden <[email protected]> wrote:

> I think Chad was referring to "But the important thing is that it has no
> relevance to the data_coding that gets sent over SMPP" to the coding
> specified in the SMPP message... I've just checked in kannel smpp logs it
> sets data_coding: 8 = 0x00000008 when coding=2 specified.
>
>
> 2012/4/2 Alexander Malysh <[email protected]>
>
>> Then I don't understand what should be the issue here :-) ?
>>
>> Thanks,
>> Alex
>>
>> Am 01.04.2012 um 23:15 schrieb spameden:
>>
>> Exactly what I've said :)
>>
>> If your source text is in utf8 you need to specify charset=utf8 and
>> coding=2.
>>
>> 2012/4/2 Alexander Malysh <[email protected]>
>>
>>> Hi,
>>>
>>> cyrillic can only be send with ucs2 therefore coding=2.
>>>
>>> Kannel behavior for coding=2 and 3 is simple: don't touch it it's binary
>>> and up to user to encode it BUT
>>> if you need that kannel converts some charset to ucs2 for you then just
>>> use two params:
>>>  charset=YOUR_CHARSET
>>> coding=2
>>>
>>> Then kannel will do it for you.
>>>
>>> Thanks,
>>> Alex
>>>
>>> Am 31.03.2012 um 00:45 schrieb chad selph:
>>>
>>> I understand that coding=2 stands for UCS-2 but the problem I'm pointing
>>> out is that it doesn't actually re-encode the UTF8 bytes into actual UCS-2
>>> bytes.  This is inconsistent because it will convert utf8 to GSM, or to
>>> Latin-1 (if the alt-charset is set to Latin1).
>>>
>>> As far as the "charset" parameter: from my understand of the docs, it's
>>> actually irrelevant to the SMPP stuff, this is just for you to tell smsbox
>>> which percent encoding your text is in (URLs only support ascii).  It
>>> defaults to UTF-8 in the newer versions and this is what prefer to use.
>>>  But the important thing is that it has no relevance to the data_coding
>>> that gets sent over SMPP.
>>>
>>>
>>> On Fri, Mar 30, 2012 at 3:20 PM, spameden <[email protected]> wrote:
>>>
>>>> utf8 + coding=0 never worked for me for cyrillic text messages.
>>>>
>>>> the only combination is coding=2 & charset=utf8, otherwise I'm getting
>>>> bollocks on mobile screen.
>>>>
>>>> according to the kannel's documentation, coding is:
>>>>
>>>> coding number
>>>> Optional. Sets the coding
>>>> scheme bits in DCS field.
>>>> Accepts values 0 to 2, for 7bit,
>>>> 8bit or UCS-2. If unset, defaults
>>>> to 7 bits unless a udh is defined,
>>>> which sets coding to 8bits.
>>>>
>>>> so coding=2 stands for UCS-2 message.
>>>>
>>>>
>>>> 2012/3/31 chad selph <[email protected]>
>>>>
>>>>> I'm trying to figure out how to send different data encodings from
>>>>> Kannel 1.5.0 over SMPP.  The SMPP Spec lists the following options for
>>>>> data_coding field:
>>>>>
>>>>> 0 0 0 0 0 0 0 0 SMSC Default Alphabet
>>>>> 0 0 0 0 0 0 0 1 IA5(CCITTT.50)/ASCII(ANSIX3.4)
>>>>> 0 0 0 0 0 0 1 0 Octet unspecified (8-bit binary)
>>>>> 0 0 0 0 0 0 1 1 Latin1(ISO-8859-1)
>>>>> 0 0 0 0 0 1 0 0 Octet unspecified (8-bit binary)
>>>>> 0 0 0 0 0 1 0 1 JIS(X0208-1990)
>>>>> 0 0 0 0 0 1 1 0 Cyrllic(ISO-8859-5)
>>>>> 0 0 0 0 0 1 1 1 Latin/Hebrew (ISO-8859-8)
>>>>> 0 0 0 0 1 0 0 0 UCS2(ISO/IEC-10646)
>>>>> ... and some others.
>>>>>
>>>>> To initiate MT messages, we're using the sendsms http interface on
>>>>> smsbox (the one here:
>>>>> http://www.kannel.org/download/1.5.0/userguide-1.5.0/userguide.html#AEN4623).
>>>>>   It looks like the only relevant parameter into the sendsms is the
>>>>> "coding" parameter, which can only be 0, 1, or 2.  "0" causes data_coding
>>>>> 0, 1 causes 4, and 2 causes 8.  I don't see a way to set data_coding to 3,
>>>>> for example, in order to do Latin-1.
>>>>>
>>>>> Another thing is that only 0 causes the message text to get encoded
>>>>> from UTF-8 (input encoding from http) into the correct encoding.  For
>>>>> example, sending the UTF-8 data with coding=2 does not re-encode the
>>>>> message into USC-2, but just sends your UTF-8 bytes as if they were UCS-2
>>>>> but sending utf8 data with coding=0 does re-encode them into GSM.
>>>>>
>>>>> These things seem to me to be incorrect behavior, however given the
>>>>> wide use of kannel I figured I should make sure I'm not missing something
>>>>> obvious before I draft a patch to attempt to fix them.  Am I missing
>>>>> something?
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Reply via email to