Re: [twsocket] Found a bug and made a fix in function UrlDecode
Arno, ... try the following code and let me know how it works for you, The code works for me. But should not the first overload directive be inside the conditional define? Regards Bjørnar -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Bjørnar Nielsen wrote: Arno, ... try the following code and let me know how it works for you, The code works for me. But should not the first overload directive be inside the conditional define? Yes, it's better inside, corrected. Just updated the svn repository. -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Arno, Would you expect a correct result as well if you base64-decoded a quoted-printable encoded string? No, I agree. But the problem is not the decoding itself but the way Unicode-chars and Ansi-chars are treated. This is the line where the problem lies: MyAnsichar := AnsiChar(UnicodeUrl[I]); If UnicodeUrl is switched to AnsiString, the problem disappears. Can't you use your own custom function then? Yes I can, but I think other users could benefit from my proposed change. I think this is a problem that was introduced with porting to Builder 2010 and using UnicodeString. This problem was not there before and maybe other users also have this problem now without knowing it. Why not make the changes I proposed when all it does is restoring the function to old behavior as when only AnsiString was used? Regards Bjørnar -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Bjørnar Nielsen wrote: Arno, Would you expect a correct result as well if you base64-decoded a quoted-printable encoded string? No, I agree. But the problem is not the decoding itself but the way Unicode-chars and Ansi-chars are treated. This is the line where the problem lies: MyAnsichar := AnsiChar(UnicodeUrl[I]); Yes, it expects a Char containing a 7-bit printable ASCII character. If UnicodeUrl is switched to AnsiString, the problem disappears. This would introduce plenty of implicit string casts in existing Delphi code because in Delphi an ICS-URL is of type string, only the mapping of string changed in D2009+ from AnsiString to UnicodeString. This is different in C++ Builder where generated .hpp files export the mapped types (AnsiString and UnicodeString explicitly). Note that such introduced string casts would corrupt invalid URLs *as well*. Can't you use your own custom function then? Yes I can, but I think other users could benefit from my proposed change. I think this is a problem that was introduced with porting to Builder 2010 and using UnicodeString. This problem was not there before and maybe other users also have this problem now without knowing it. Why not make the changes I proposed when all it does is restoring the function to old behavior as when only AnsiString was used? The only workaround that comes to my mind was another overload that takes a RawByteString instead of string. I won't use AnsiString because implicit ansi string casts must be avoided too. -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Arno, The only workaround that comes to my mind was another overload that takes a RawByteString instead of string. I won't use AnsiString because implicit ansi string casts must be avoided too. That would work for me. I'm not very familiar with the use of RawByteString, but I made version of the function that works for me, do you think this version would work for others too (I just testet in 2010 C++ builder): Regards Bjørnar Code follows (only change from previous version I sent is type-change of first in-param): function UrlDecode(const S : RawByteString ; SrcCodePage: Cardinal = CP_ACP; DetectUtf8: Boolean = TRUE) : UnicodeString ; var I, J, L : Integer; U8Str : AnsiString; Ch : AnsiChar; begin L := Length(S); SetLength(U8Str, L); I := 1; J := 0; while (I = L) and (S[I] '') do begin Ch := AnsiChar(S[I]); if Ch = '%' then begin Ch := AnsiChar(htoi2(PAnsiChar(@S[I + 1]))); Inc(I, 2); end else if Ch = '+' then Ch := ' '; Inc(J); U8Str[J] := Ch; Inc(I); end; SetLength(U8Str, J); if (SrcCodePage = CP_UTF8) or (DetectUtf8 and IsUtf8Valid(U8Str)) then {$IFDEF COMPILER12_UP} Result := Utf8ToStringW(U8Str) else Result := AnsiToUnicode(U8Str, SrcCodePage); {$ELSE} Result := Utf8ToStringA(U8Str) else Result := U8Str; {$ENDIF} end; -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Bjørnar, Arno, The only workaround that comes to my mind was another overload that takes a RawByteString instead of string. I won't use AnsiString because implicit ansi string casts must be avoided too. That would work for me. I'm not very familiar with the use of RawByteString, but I made version of the function that works for me, do you think this version would work for others too (I just testet in 2010 C++ builder): The new overload is only required in RDS 2009+. It has to be conditional compiled for other reasons too, please try the following code and let me know how it works for you, first declaration just got the overload directive: function UrlDecode(const S : String; SrcCodePage : LongWord = CP_ACP; DetectUtf8 : Boolean = TRUE) : String; overload; {$IFDEF COMPILER12_UP} function UrlDecode(const S: RawByteString; SrcCodePage: LongWord = CP_ACP; DetectUtf8: Boolean = TRUE) : UnicodeString; overload; {$ENDIF} {$IFDEF COMPILER12_UP} function UrlDecode(const S: RawByteString; SrcCodePage: LongWord = CP_ACP; DetectUtf8: Boolean = TRUE) : UnicodeString; var I, J, L : Integer; U8Str : AnsiString; Ch : AnsiChar; begin L := Length(S); SetLength(U8Str, L); I := 1; J := 0; while (I = L) and (S[I] '') do begin Ch := AnsiChar(S[I]); if Ch = '%' then begin Ch := AnsiChar(htoi2(PAnsiChar(@S[I + 1]))); Inc(I, 2); end else if Ch = '+' then Ch := ' '; Inc(J); U8Str[J] := Ch; Inc(I); end; SetLength(U8Str, J); if (SrcCodePage = CP_UTF8) or (DetectUtf8 and IsUtf8Valid(U8Str)) then Result := Utf8ToStringW(U8Str) else Result := AnsiToUnicode(U8Str, SrcCodePage); end; {$ENDIF} -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
A little change inside the function also must be made to make it work: The line with htoi2 need a little change, the complete code is this: function UrlDecode(const S : AnsiString; SrcCodePage: Cardinal = CP_ACP; DetectUtf8: Boolean = TRUE) : String; var I, J, L : Integer; U8Str : AnsiString; Ch : AnsiChar; begin L := Length(S); SetLength(U8Str, L); I := 1; J := 0; while (I = L) and (S[I] '') do begin Ch := AnsiChar(S[I]); if Ch = '%' then begin Ch := AnsiChar(htoi2(PAnsiChar(@S[I + 1]))); Inc(I, 2); end else if Ch = '+' then Ch := ' '; Inc(J); U8Str[J] := Ch; Inc(I); end; SetLength(U8Str, J); if (SrcCodePage = CP_UTF8) or (DetectUtf8 and IsUtf8Valid(U8Str)) then {$IFDEF COMPILER12_UP} Result := Utf8ToStringW(U8Str) else Result := AnsiToUnicode(U8Str, SrcCodePage); {$ELSE} Result := Utf8ToStringA(U8Str) else Result := U8Str; {$ENDIF} end; Regards Bjørnar -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Bjørnar Nielsen Sent: 5. august 2010 12:52 To: ICS support mailing (twsocket@elists.org) Subject: [twsocket] Found a bug and made a fix in function UrlDecode Proposal to a fix on bug in UrlDecode in OverbyteIcsUrl.pas and OverbyteIcsHttpSrv.pas. When calling the function like this: Memo2-Text = UrlDecode(Ã...ge,CP_ACP,false); // Ã...ge is Memo2-UTF8encoding of Åge The resulting text in Memo2 is Ãge and is impossible to UTF8-dekode back to the original text. The fix is to change this: function UrlDecode(const S : String; SrcCodePage: Cardinal = CP_ACP; DetectUtf8: Boolean = TRUE) : String; To this: function UrlDecode(const S : AnsiString; SrcCodePage: Cardinal = CP_ACP; DetectUtf8: Boolean = TRUE) : String; Anyone have any comment on this fix? Regards Bjørnar -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Dear bjor...@sentinel.no, I will be away on holiday from 27/7/2010 until 10/8/2010 and will be unable to deal with your recent message regarding `Re: [twsocket] Found a bug and made a fix in function UrlDecode`. For technical support enquiries please email supp...@ietgroup.com or telephone 01442 878777. Best regards, Andrew Leiper IET Ltd 01442 878777 -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Bjørnar, When calling the function like this: Memo2-Text = UrlDecode(Ã...ge,CP_ACP,false); // Ã...ge is UTF8encoding of Åge Ã...ge is not a valid URL encoded string. Åge URL encoded was: %C3%85ge //UTF-8 %C5ge //Windows-1252 Try this: {code} var Str: string; begin Str := 'Åge'; Str := UrlEncode(Str, CP_UTF8); Caption := UrlDecode(Str, CP_UTF8, False); end; {code} -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Dear arno.garr...@gmx.de, I will be away on holiday from 27/7/2010 until 10/8/2010 and will be unable to deal with your recent message regarding `Re: [twsocket] Found a bug and made a fix in function UrlDecode`. For technical support enquiries please email supp...@ietgroup.com or telephone 01442 878777. Best regards, Andrew Leiper IET Ltd 01442 878777 -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Ã...ge is not a valid URL encoded string. I know, but it is valid UTF8. I think trying to url-decode it should not break the string. I have a webserver that works against different clients, and not all of the clients url-encode data in the url. But all of the clients UTF8-encode data. That means that if I try to url-decode utf8-data that’s not url-encoded, the data gets messed up and I had a problem until I made this fix. Regards Bjørnar -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Dear bjor...@sentinel.no, I will be away on holiday from 27/7/2010 until 10/8/2010 and will be unable to deal with your recent message regarding `Re: [twsocket] Found a bug and made a fix in function UrlDecode`. For technical support enquiries please email supp...@ietgroup.com or telephone 01442 878777. Best regards, Andrew Leiper IET Ltd 01442 878777 -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] Found a bug and made a fix in function UrlDecode
Bjørnar, Ã...ge is not a valid URL encoded string. I know, but it is valid UTF8. I think trying to url-decode it should not break the string. I do not think so. Would you expect a correct result as well if you base64-decoded a quoted-printable encoded string? An URL containing anything else than characters from the printable 7-bit ASCII range is invalid. Just like Base64Decode requires properly encoded data to return correct results UrlDecode requires a valid URL to work correctly. This requirement has the advantage that it works with string when both string maps to UnicodeString and to AnsiString because no implicit string cast will corrupt the string if an AnsiString is passed and string maps to UnicodeString. However I must admit that it is somehow breaking behavior when you port your apps to Unicode. I have a webserver that works against different clients, and not all of the clients url-encode data in the url. Those clients definitively violate RFC. But all of the clients UTF8-encode data. That means that if I try to url-decode utf8-data that’s not url-encoded, the data gets messed up and I had a problem until I made this fix. Can't you use your own custom function then? -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be