Re: [twsocket] help to convert from utf8 to ansi (locale)
Xavier Mor-Mur wrote: String usStr; (or UnicodeString usStr;) AnsiString asStr; usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ); asStr = usStr; --- Compiler introduce required conversion code using asStr = UTF8Decode( usStr ); or asSTR = UTF8Decode( URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) ); Conversion from Unicode to AnsiString might lead to dataloss, whether the source be UTF-8 or UTF-16. If you actually need a AnsiString with code page CP_UTF8 you should use type UTF8String instead of AnsiString. Since 2009 the compiler is code page aware and implicitly converts between UTF8String and (Unicode)String without dataloss and warnings. In this case one could also write a slighty different version of UrlDecode() that returned a UTF8String to save a few conversions. -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Hi RTT Yes I use UrlDecode from OverbyteIcsUrl.hpp with defaults SrcCodePage and DetectUtf8. Alternative, I not found other, was include Indy component but using ICS I think isn't unnecessary. I will check using no default parameters. Thanks again Xavi Al 29/04/2010 03:16, En/na RTT ha escrit: Hi Xavier, The UTF8Decode is being performed automatically probably because you are using the UrlDecode function from OverbyteIcsHttpSrv. If you check its definition, function UrlDecode(const Url: string; SrcCodePage: LongWord = CP_ACP; DetectUtf8: Boolean = TRUE): string; there is a, default set to true, parameter, DetectUtf8, to define if the auto-detection, and decoding, of UTF8 encoded strings must be made automatically by the function after the URLDecode conversion Hi RTT Thanks for your tips. Finally I get it work but I should write code a bit different String usStr; (or UnicodeString usStr;) AnsiString asStr; usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ); asStr = usStr; --- Compiler introduce required conversion code using asStr = UTF8Decode( usStr ); or asSTR = UTF8Decode( URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) ); give asStr = NULL I think D2009 and BC2009 works diferent when doing inline automatic conversions. Many thanks for all Xavi -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be Se certifico que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Version: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la version: 04/28/10 08:27:00 -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Hello Arno Thanks your comments. I need update my head from ansi to unicode :-) Should not be problem, or that I think, converting unicode encoded files to ansi. as all files will created on local network. I will read a bit more about differences between UTF8String and UnicodeString, help from BC2009 is not clear. Xavi Al 29/04/2010 08:54, En/na Arno Garrels ha escrit: Xavier Mor-Mur wrote: String usStr; (or UnicodeString usStr;) AnsiString asStr; usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ); asStr = usStr;--- Compiler introduce required conversion code using asStr = UTF8Decode( usStr ); or asSTR = UTF8Decode( URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) ); Conversion from Unicode to AnsiString might lead to dataloss, whether the source be UTF-8 or UTF-16. If you actually need a AnsiString with code page CP_UTF8 you should use type UTF8String instead of AnsiString. Since 2009 the compiler is code page aware and implicitly converts between UTF8String and (Unicode)String without dataloss and warnings. In this case one could also write a slighty different version of UrlDecode() that returned a UTF8String to save a few conversions. -- Arno Garrels -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be Se certifico que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Version: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la version: 04/28/10 08:27:00 -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
[twsocket] help to convert from utf8 to ansi (locale)
Hi to all I need to parse text to send via email if there are declared any file. If text is html and saved from word processors or html editors all chars out of first 127 ASCII set are convert using utf8 convention. What I need is recover original text to check if declared files exists. First I was working with BCB5 but it have reduced support for unicode strings. Now I'm using BC2009 but I'm walking around the same point. Example of I get as parameter and what I need to test : from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg and I need Sin título1_html_m5b7e3440.jpg I played with ansi, wide, unicode, utf8 variables and functions with no success. Thanks in advance for your help. -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
The %## codes are representations of the character. It looks like a URL encoding scheme to me. For example: %20 is the space character. The %C3%AD looks to be the letter i with the accent. Run it though a URL Decoder and see if that doesnt get you what you are looking for? Matt -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Xavier Mor-Mur Sent: Wednesday, April 28, 2010 14:19 To: ICS support mailing Subject: [twsocket] help to convert from utf8 to ansi (locale) Hi to all I need to parse text to send via email if there are declared any file. If text is html and saved from word processors or html editors all chars out of first 127 ASCII set are convert using utf8 convention. What I need is recover original text to check if declared files exists. First I was working with BCB5 but it have reduced support for unicode strings. Now I'm using BC2009 but I'm walking around the same point. Example of I get as parameter and what I need to test : from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg and I need Sin título1_html_m5b7e3440.jpg I played with ansi, wide, unicode, utf8 variables and functions with no success. Thanks in advance for your help. -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Thanks for the tip. I tried it with no success. URLEncode and URLDecode work apparently byte a byte and %C3%AD are converted as individual chars. UTF8Encode and UTF8Decode I don't get it work. certainly I'm doing something wrong. Regards Xavi Al 28/04/2010 22:36, En/na Matt Minnis ha escrit: The %## codes are representations of the character. It looks like a URL encoding scheme to me. For example: %20 is the space character. The %C3%AD looks to be the letter i with the accent. Run it though a URL Decoder and see if that doesn't get you what you are looking for? Matt -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Xavier Mor-Mur Sent: Wednesday, April 28, 2010 14:19 To: ICS support mailing Subject: [twsocket] help to convert from utf8 to ansi (locale) Hi to all I need to parse text to send via email if there are declared any file. If text is html and saved from word processors or html editors all chars out of first 127 ASCII set are convert using utf8 convention. What I need is recover original text to check if declared files exists. First I was working with BCB5 but it have reduced support for unicode strings. Now I'm using BC2009 but I'm walking around the same point. Example of I get as parameter and what I need to test : from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg and I need Sin título1_html_m5b7e3440.jpg I played with ansi, wide, unicode, utf8 variables and functions with no success. Thanks in advance for your help. -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la versión: 04/28/10 08:27:00 -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Curiosity struck, so I googled it... Apparently you aren't the only one with this issue. If you have the string before it gets encoded in the 1st place you can convert to UTF8 first, then encode it to URL so you can decode it properly. If you don't have access to the string before encoding... I think you have a problem and may have to handle on your own. You might be able to detect the 1st as a high ascii value and use that to flag using both to combine as one. If you do find/create a solution, let us know. I'm sure we'll run across needing something like that at some point... Matt -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Xavier Mor-Mur Sent: Wednesday, April 28, 2010 15:53 To: ICS support mailing Subject: Re: [twsocket] help to convert from utf8 to ansi (locale) Thanks for the tip. I tried it with no success. URLEncode and URLDecode work apparently byte a byte and %C3%AD are converted as individual chars. UTF8Encode and UTF8Decode I don't get it work. certainly I'm doing something wrong. Regards Xavi Al 28/04/2010 22:36, En/na Matt Minnis ha escrit: The %## codes are representations of the character. It looks like a URL encoding scheme to me. For example: %20 is the space character. The %C3%AD looks to be the letter i with the accent. Run it though a URL Decoder and see if that doesn't get you what you are looking for? Matt -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Xavier Mor-Mur Sent: Wednesday, April 28, 2010 14:19 To: ICS support mailing Subject: [twsocket] help to convert from utf8 to ansi (locale) Hi to all I need to parse text to send via email if there are declared any file. If text is html and saved from word processors or html editors all chars out of first 127 ASCII set are convert using utf8 convention. What I need is recover original text to check if declared files exists. First I was working with BCB5 but it have reduced support for unicode strings. Now I'm using BC2009 but I'm walking around the same point. Example of I get as parameter and what I need to test : from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg and I need Sin título1_html_m5b7e3440.jpg I played with ansi, wide, unicode, utf8 variables and functions with no success. Thanks in advance for your help. -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la versión: 04/28/10 08:27:00 -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
I'll will work on it as I need to solve. Situation exposed is a test after report program was not working properly. As I detected MS-Word encode partially and OpenOffice encode fully. Regards Al 28/04/2010 23:41, En/na Matt Minnis ha escrit: Al 28/04/2010 22:36, En/na Matt Minnis ha escrit: The %## codes are representations of the character. It looks like a URL encoding scheme to me. For example: %20 is the space character. The %C3%AD looks to be the letter i with the accent. Run it though a URL Decoder and see if that doesn't get you what you are looking for? Matt -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
This seems to work fine UTF8Decode(URLDecode('Sin título1_html_m5b7e3440.jpg')); Curiosity struck, so I googled it... Apparently you aren't the only one with this issue. If you have the string before it gets encoded in the 1st place you can convert to UTF8 first, then encode it to URL so you can decode it properly. If you don't have access to the string before encoding... I think you have a problem and may have to handle on your own. You might be able to detect the 1st as a high ascii value and use that to flag using both to combine as one. If you do find/create a solution, let us know. I'm sure we'll run across needing something like that at some point... Matt -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Xavier Mor-Mur Sent: Wednesday, April 28, 2010 15:53 To: ICS support mailing Subject: Re: [twsocket] help to convert from utf8 to ansi (locale) Thanks for the tip. I tried it with no success. URLEncode and URLDecode work apparently byte a byte and %C3%AD are converted as individual chars. UTF8Encode and UTF8Decode I don't get it work. certainly I'm doing something wrong. Regards Xavi Al 28/04/2010 22:36, En/na Matt Minnis ha escrit: The %## codes are representations of the character. It looks like a URL encoding scheme to me. For example: %20 is the space character. The %C3%AD looks to be the letter i with the accent. Run it though a URL Decoder and see if that doesn't get you what you are looking for? Matt -Original Message- From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On Behalf Of Xavier Mor-Mur Sent: Wednesday, April 28, 2010 14:19 To: ICS support mailing Subject: [twsocket] help to convert from utf8 to ansi (locale) Hi to all I need to parse text to send via email if there are declared any file. If text is html and saved from word processors or html editors all chars out of first 127 ASCII set are convert using utf8 convention. What I need is recover original text to check if declared files exists. First I was working with BCB5 but it have reduced support for unicode strings. Now I'm using BC2009 but I'm walking around the same point. Example of I get as parameter and what I need to test : from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg and I need Sin título1_html_m5b7e3440.jpg I played with ansi, wide, unicode, utf8 variables and functions with no success. Thanks in advance for your help. -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la versión: 04/28/10 08:27:00 -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Obviously what works fine is the sequence of decode transformations on your encoded filename, not in the already decoded string as I posted. Sorry. This is the correct example: UTF8Decode(URLDEcode('Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg')); This seems to work fine UTF8Decode(URLDecode('Sin título1_html_m5b7e3440.jpg')); -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Hi RTT Thanks for your tips. Finally I get it work but I should write code a bit different String usStr; (or UnicodeString usStr;) AnsiString asStr; usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ); asStr = usStr; --- Compiler introduce required conversion code using asStr = UTF8Decode( usStr ); or asSTR = UTF8Decode( URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) ); give asStr = NULL I think D2009 and BC2009 works diferent when doing inline automatic conversions. Many thanks for all Xavi Al 29/04/2010 00:37, En/na RTT ha escrit: Obviously what works fine is the sequence of decode transformations on your encoded filename, not in the already decoded string as I posted. Sorry. This is the correct example: UTF8Decode(URLDEcode('Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg')); This seems to work fine UTF8Decode(URLDecode('Sin título1_html_m5b7e3440.jpg')); -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la versión: 04/28/10 08:27:00 -- Xavier Mor-Mur -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be
Re: [twsocket] help to convert from utf8 to ansi (locale)
Hi Xavier, The UTF8Decode is being performed automatically probably because you are using the UrlDecode function from OverbyteIcsHttpSrv. If you check its definition, function UrlDecode(const Url: string; SrcCodePage: LongWord = CP_ACP; DetectUtf8: Boolean = TRUE): string; there is a, default set to true, parameter, DetectUtf8, to define if the auto-detection, and decoding, of UTF8 encoded strings must be made automatically by the function after the URLDecode conversion Hi RTT Thanks for your tips. Finally I get it work but I should write code a bit different String usStr; (or UnicodeString usStr;) AnsiString asStr; usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ); asStr = usStr; --- Compiler introduce required conversion code using asStr = UTF8Decode( usStr ); or asSTR = UTF8Decode( URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) ); give asStr = NULL I think D2009 and BC2009 works diferent when doing inline automatic conversions. Many thanks for all Xavi -- To unsubscribe or change your settings for TWSocket mailing list please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket Visit our website at http://www.overbyte.be