I understand that coding=2 stands for UCS-2 but the problem I'm pointing out is that it doesn't actually re-encode the UTF8 bytes into actual UCS-2 bytes. This is inconsistent because it will convert utf8 to GSM, or to Latin-1 (if the alt-charset is set to Latin1).
As far as the "charset" parameter: from my understand of the docs, it's actually irrelevant to the SMPP stuff, this is just for you to tell smsbox which percent encoding your text is in (URLs only support ascii). It defaults to UTF-8 in the newer versions and this is what prefer to use. But the important thing is that it has no relevance to the data_coding that gets sent over SMPP. On Fri, Mar 30, 2012 at 3:20 PM, spameden <[email protected]> wrote: > utf8 + coding=0 never worked for me for cyrillic text messages. > > the only combination is coding=2 & charset=utf8, otherwise I'm getting > bollocks on mobile screen. > > according to the kannel's documentation, coding is: > > coding number > Optional. Sets the coding > scheme bits in DCS field. > Accepts values 0 to 2, for 7bit, > 8bit or UCS-2. If unset, defaults > to 7 bits unless a udh is defined, > which sets coding to 8bits. > > so coding=2 stands for UCS-2 message. > > > 2012/3/31 chad selph <[email protected]> > >> I'm trying to figure out how to send different data encodings from Kannel >> 1.5.0 over SMPP. The SMPP Spec lists the following options for data_coding >> field: >> >> 0 0 0 0 0 0 0 0 SMSC Default Alphabet >> 0 0 0 0 0 0 0 1 IA5(CCITTT.50)/ASCII(ANSIX3.4) >> 0 0 0 0 0 0 1 0 Octet unspecified (8-bit binary) >> 0 0 0 0 0 0 1 1 Latin1(ISO-8859-1) >> 0 0 0 0 0 1 0 0 Octet unspecified (8-bit binary) >> 0 0 0 0 0 1 0 1 JIS(X0208-1990) >> 0 0 0 0 0 1 1 0 Cyrllic(ISO-8859-5) >> 0 0 0 0 0 1 1 1 Latin/Hebrew (ISO-8859-8) >> 0 0 0 0 1 0 0 0 UCS2(ISO/IEC-10646) >> ... and some others. >> >> To initiate MT messages, we're using the sendsms http interface on smsbox >> (the one here: >> http://www.kannel.org/download/1.5.0/userguide-1.5.0/userguide.html#AEN4623). >> It looks like the only relevant parameter into the sendsms is the >> "coding" parameter, which can only be 0, 1, or 2. "0" causes data_coding >> 0, 1 causes 4, and 2 causes 8. I don't see a way to set data_coding to 3, >> for example, in order to do Latin-1. >> >> Another thing is that only 0 causes the message text to get encoded from >> UTF-8 (input encoding from http) into the correct encoding. For example, >> sending the UTF-8 data with coding=2 does not re-encode the message into >> USC-2, but just sends your UTF-8 bytes as if they were UCS-2 but sending >> utf8 data with coding=0 does re-encode them into GSM. >> >> These things seem to me to be incorrect behavior, however given the wide >> use of kannel I figured I should make sure I'm not missing something >> obvious before I draft a patch to attempt to fix them. Am I missing >> something? >> > >
