Re: [twsocket] pop3, buffer and character encoding

2010-07-03 Thread Arno Garrels
Zvone wrote:
 TBytes was the datatype to be used. However that would break
 backwards compatibility since historically string was used
 everywhere. That all was no problem, however the rule is: DO NOT
 BREAK BACKWARDS COMPATIBILITY and that is where the problems begin.
 
 Well you've already broken this rule by using UnicodeString as return
 type 

The ICS components use string since the beginning, it just happend 
in RDS 2009 that its mapping changed from AnsiString to UnicodeString.
Only C++Builder packages always used the mapped type explicitly. 

 so I don't see a big deal here. Also ICS 7 is used by users that
 need to recompile their code in D2009/2010 so they do have to adapt
 their code anyway as it won't work properly with D2009 not just with
 ICS. 

No, in Delphi there are no interface changes. Properties, signatures
of methods and event handlers did not change. That is different in
C++Builder where everything changed from AnsiString to UnicodeString. 

 For compatibility purposes you have ICS 6 and 5.
 
 And it's not a big problem - I have 2 ideas then to satisfy everybody
 and improve codepages compatibility:
 
 Idea 1: Why not introduce TPop3Cli.LastResponseTb which would be
 identical to LastResponse except different type (TBytes)?

Yes, probably we should introduce a new property LastRawResponse: TBytes
as workaround.

--
Arno Garrels


--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] pop3, buffer and character encoding

2010-07-02 Thread Arno Garrels
Hello Zvone,

Yes, the Unicode implementation of the POP3 client is weak.
It converts the bytes received to Unicode with current Ansi
system codepage. This is direct result of ICS rule #1 to not
break backwards compatibility.
However, as long as this codepage was one of the windows-xyz, 
single byte character sets converting back to Ansi with the 
same codepage should work without data loss and give you back
the raw bytes (hopefully). This won't work, for example, with
Japanese locale settings.

Note that in MIME headers 8-bit characters are not allowed! 
Email clients must encode 8-bit characters in header lines 
properly. MIME text parts might include 8-bit characters with 
a charset specified in the content-type header, in those cases
the (AnsiString) text content has to be converted to Unicode 
with the charset specified.  

--
Arno Garrels 
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] pop3, buffer and character encoding

2010-07-02 Thread Arno Garrels
Zvone wrote:
 However, as long as this codepage was one of the windows-xyz,
 single byte character sets converting back to Ansi with the
 same codepage should work without data loss and give you back
 the raw bytes (hopefully). This won't work, for example, with
 Japanese locale settings.
 
 But there is a string type for that purpose. It is called
 RawByteString. It is defined as AnsiString($) which in effect
 means it is an ansistring with no encoding attached to it so you can
 use it to transfer data from functions and avoid codepage conversions.

The purpose of type RawByteString is to avoid implicit string casts.
As you know, the compiler is codepage aware of AnsiStrings since 2009+.
Use of RawByteString makes sense as parameter, especially to
avoid writing plenty of overloades, nothing else.

 
 Yes, by default it uses default system code page for conversions to
 Unicode. 
 
 RawByteString is a single-byte character type but unlike AnsiString it
 does not have a specific encoding attached to it. So that means it can
 be used to pass values to and from functions that will do
 UnicodeConversions. It is not indended to be used for storing data,
 just mostly for input/output of functions as the official
 documentation specify.

RawByteStrings are not implcitly converted by compiler magic, 
that's all, and this type is not documented well. 

 
 So my best bet is that it would be the best to receive raw byte buffer
 (unsigned char or BYTE type) and then place it into RawByteString and
 return that value. This should avoid conversions.

TBytes was the datatype to be used. However that would break 
backwards compatibility since historically string was used everywhere.

 
 In your own functions you can cast RawByteString as input type and use
 conversion functions to convert from RawByteString to any codepage you
 like (or store it as binary data). There are some functions that do
 this I think the ones you need are SetCodePage() and StringCodePage().

That all was no problem, however the rule is: DO NOT BREAK BACKWARDS 
COMPATIBILITY and that is where the problems begin.

 
 Other than that, AnsiString can be defined in various codepages for
 example you can declare a
 typedef AnsiStringT28591 Latin1String; and store data in
 Latin1String type - this will ensure that the codepage conversions are
 always in identical codepage and not dependent on the system code
 page.

I think the ICS components need a lot more changes (basic design changes).
  

--
Arno Garrels
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] pop3, buffer and character encoding

2010-07-02 Thread Zvone
 TBytes was the datatype to be used. However that would break
 backwards compatibility since historically string was used everywhere.
 That all was no problem, however the rule is: DO NOT BREAK BACKWARDS
 COMPATIBILITY and that is where the problems begin.

Well you've already broken this rule by using UnicodeString as return
type so I don't see a big deal here. Also ICS 7 is used by users that
need to recompile their code in D2009/2010 so they do have to adapt
their code anyway as it won't work properly with D2009 not just with
ICS. For compatibility purposes you have ICS 6 and 5.

And it's not a big problem - I have 2 ideas then to satisfy everybody
and improve codepages compatibility:

Idea 1: Why not introduce TPop3Cli.LastResponseTb which would be
identical to LastResponse except different type (TBytes)?

Idea 2 (probably better one): Why not introduce TPop3Cli.ResponseType
enum which you could choose if you want TBytes or String and the
default if you don't use it would be String for backward
compatibility?

Both changes should be relatively easy to implement like a few lines
of code... and you could apply them for other components as well that
may have issues with Unicode.
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


[twsocket] pop3, buffer and character encoding

2010-07-01 Thread Zvone
OK, I have an issue which I don't entirely understand.

The way to check received pop3 message is to create Pop3CliMessageLine event
and then add Pop3Cli.LastResponse to your own buffer (UnicodeString) and
store that as email.

However, this creates a bit of a problem - LastResponse is UnicodeString.
Normally, that's fine but the sender may send message in some single-byte
encoding.
So in order to handle this properly - I'd have to store message as bytes
(raw buffer received from server) and later when reading it read the
encoding from message header and convert bytes to UnicodeString using the
encoding of the message.

But LastResponse is already encoded in Unicode so I don't get raw bytes
output but some converted output using some built in logic which I don't get
it. if I send for example 0x80, 0x81, 0x82 I may not get the identical
result in LastResponse. In fact all the characters above 0x7f may be messed
up. It's ok for ASCII messages but not for others.

So how is this exactly handled and is there a way to receive raw bytes
from socket to convert them to proper encoding later after storing them?
--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be