Re: [twsocket] pop3, buffer and character encoding
Zvone wrote: TBytes was the datatype to be used. However that would break backwards compatibility since historically string was used everywhere. That all was no problem, however the rule is: DO NOT BREAK BACKWARDS COMPATIBILITY and that is where the problems begin. Well you've already broken this rule by using UnicodeString as return type The ICS components use string since the beginning, it just happend in RDS 2009 that its mapping changed from AnsiString to UnicodeString. Only C++Builder packages always used the mapped type explicitly. so I don't see a big deal here. Also ICS 7 is used by users that need to recompile their code in D2009/2010 so they do have to adapt their code anyway as it won't work properly with D2009 not just with ICS. No, in Delphi there are no interface changes. Properties, signatures of methods and event handlers did not change. That is different in C++Builder where everything changed from AnsiString to UnicodeString. For compatibility purposes you have ICS 6 and 5. And it's not a big problem - I have 2 ideas then to satisfy everybody and improve codepages compatibility: Idea 1: Why not introduce TPop3Cli.LastResponseTb which would be identical to LastResponse except different type (TBytes)? Yes, probably we should introduce a new property LastRawResponse: TBytes as workaround. -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] pop3, buffer and character encoding
Hello Zvone, Yes, the Unicode implementation of the POP3 client is weak. It converts the bytes received to Unicode with current Ansi system codepage. This is direct result of ICS rule #1 to not break backwards compatibility. However, as long as this codepage was one of the windows-xyz, single byte character sets converting back to Ansi with the same codepage should work without data loss and give you back the raw bytes (hopefully). This won't work, for example, with Japanese locale settings. Note that in MIME headers 8-bit characters are not allowed! Email clients must encode 8-bit characters in header lines properly. MIME text parts might include 8-bit characters with a charset specified in the content-type header, in those cases the (AnsiString) text content has to be converted to Unicode with the charset specified. -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] pop3, buffer and character encoding
Zvone wrote: However, as long as this codepage was one of the windows-xyz, single byte character sets converting back to Ansi with the same codepage should work without data loss and give you back the raw bytes (hopefully). This won't work, for example, with Japanese locale settings. But there is a string type for that purpose. It is called RawByteString. It is defined as AnsiString($) which in effect means it is an ansistring with no encoding attached to it so you can use it to transfer data from functions and avoid codepage conversions. The purpose of type RawByteString is to avoid implicit string casts. As you know, the compiler is codepage aware of AnsiStrings since 2009+. Use of RawByteString makes sense as parameter, especially to avoid writing plenty of overloades, nothing else. Yes, by default it uses default system code page for conversions to Unicode. RawByteString is a single-byte character type but unlike AnsiString it does not have a specific encoding attached to it. So that means it can be used to pass values to and from functions that will do UnicodeConversions. It is not indended to be used for storing data, just mostly for input/output of functions as the official documentation specify. RawByteStrings are not implcitly converted by compiler magic, that's all, and this type is not documented well. So my best bet is that it would be the best to receive raw byte buffer (unsigned char or BYTE type) and then place it into RawByteString and return that value. This should avoid conversions. TBytes was the datatype to be used. However that would break backwards compatibility since historically string was used everywhere. In your own functions you can cast RawByteString as input type and use conversion functions to convert from RawByteString to any codepage you like (or store it as binary data). There are some functions that do this I think the ones you need are SetCodePage() and StringCodePage(). That all was no problem, however the rule is: DO NOT BREAK BACKWARDS COMPATIBILITY and that is where the problems begin. Other than that, AnsiString can be defined in various codepages for example you can declare a typedef AnsiStringT28591 Latin1String; and store data in Latin1String type - this will ensure that the codepage conversions are always in identical codepage and not dependent on the system code page. I think the ICS components need a lot more changes (basic design changes). -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] pop3, buffer and character encoding
TBytes was the datatype to be used. However that would break backwards compatibility since historically string was used everywhere. That all was no problem, however the rule is: DO NOT BREAK BACKWARDS COMPATIBILITY and that is where the problems begin. Well you've already broken this rule by using UnicodeString as return type so I don't see a big deal here. Also ICS 7 is used by users that need to recompile their code in D2009/2010 so they do have to adapt their code anyway as it won't work properly with D2009 not just with ICS. For compatibility purposes you have ICS 6 and 5. And it's not a big problem - I have 2 ideas then to satisfy everybody and improve codepages compatibility: Idea 1: Why not introduce TPop3Cli.LastResponseTb which would be identical to LastResponse except different type (TBytes)? Idea 2 (probably better one): Why not introduce TPop3Cli.ResponseType enum which you could choose if you want TBytes or String and the default if you don't use it would be String for backward compatibility? Both changes should be relatively easy to implement like a few lines of code... and you could apply them for other components as well that may have issues with Unicode. -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
[twsocket] pop3, buffer and character encoding
OK, I have an issue which I don't entirely understand. The way to check received pop3 message is to create Pop3CliMessageLine event and then add Pop3Cli.LastResponse to your own buffer (UnicodeString) and store that as email. However, this creates a bit of a problem - LastResponse is UnicodeString. Normally, that's fine but the sender may send message in some single-byte encoding. So in order to handle this properly - I'd have to store message as bytes (raw buffer received from server) and later when reading it read the encoding from message header and convert bytes to UnicodeString using the encoding of the message. But LastResponse is already encoded in Unicode so I don't get raw bytes output but some converted output using some built in logic which I don't get it. if I send for example 0x80, 0x81, 0x82 I may not get the identical result in LastResponse. In fact all the characters above 0x7f may be messed up. It's ok for ASCII messages but not for others. So how is this exactly handled and is there a way to receive raw bytes from socket to convert them to proper encoding later after storing them? -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be