Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-29 Thread Arno Garrels
Xavier Mor-Mur wrote:

 String usStr; (or UnicodeString usStr;)
 AnsiString asStr;
 
 usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg );
 asStr = usStr; --- Compiler introduce required conversion code
 
 using
   asStr = UTF8Decode( usStr );
 or
  asSTR = UTF8Decode( URLDEcode(
 Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) );

Conversion from Unicode to AnsiString might lead to dataloss, 
whether the source be UTF-8 or UTF-16. If you actually need a 
AnsiString with code page CP_UTF8 you should use type UTF8String
instead of AnsiString. 
Since 2009 the compiler is code page aware and implicitly 
converts between UTF8String and (Unicode)String without
dataloss and warnings. In this case one could also write a 
slighty different version of UrlDecode() that returned a 
UTF8String to save a few conversions.

--
Arno Garrels

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-29 Thread Xavier Mor-Mur

Hi RTT
Yes I use UrlDecode from OverbyteIcsUrl.hpp with defaults SrcCodePage 
and DetectUtf8. Alternative, I not found other, was include Indy 
component but using ICS I think isn't unnecessary.

I will check using no default parameters.

Thanks again
Xavi


Al 29/04/2010 03:16, En/na RTT ha escrit:

Hi Xavier,
The UTF8Decode is being performed automatically probably because you 
are using the UrlDecode function from OverbyteIcsHttpSrv. If you check 
its definition,


function UrlDecode(const Url: string; SrcCodePage: LongWord = CP_ACP;  
DetectUtf8: Boolean = TRUE): string;


there is a, default set to true, parameter,  DetectUtf8, to define if 
the auto-detection, and decoding, of UTF8 encoded strings must be made 
automatically by the function after the URLDecode conversion



Hi RTT

Thanks for your tips. Finally I get it work but I should write code a 
bit different


String usStr; (or UnicodeString usStr;)
AnsiString asStr;

usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg );
asStr = usStr; --- Compiler introduce required conversion code

using
 asStr = UTF8Decode( usStr );
or
asSTR = UTF8Decode( URLDEcode( 
Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) );


give asStr = NULL
I think D2009 and BC2009 works diferent when doing inline automatic 
conversions.


Many thanks for all

Xavi



--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be



Se certifico que el correo entrante no contiene virus.
Comprobada por AVG - www.avg.es
Version: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la version: 
04/28/10 08:27:00

   


--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-29 Thread Xavier Mor-Mur

Hello Arno
Thanks your comments. I need update my head from ansi to unicode :-)
Should not be problem, or that I think, converting unicode encoded files 
to ansi. as all files will created on local network.
I will read a bit more about differences between UTF8String and 
UnicodeString, help from BC2009 is not clear.


Xavi

Al 29/04/2010 08:54, En/na Arno Garrels ha escrit:

Xavier Mor-Mur wrote:

   

String usStr; (or UnicodeString usStr;)
AnsiString asStr;

usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg );
asStr = usStr;--- Compiler introduce required conversion code

using
   asStr = UTF8Decode( usStr );
or
  asSTR = UTF8Decode( URLDEcode(
Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) );
 

Conversion from Unicode to AnsiString might lead to dataloss,
whether the source be UTF-8 or UTF-16. If you actually need a
AnsiString with code page CP_UTF8 you should use type UTF8String
instead of AnsiString.
Since 2009 the compiler is code page aware and implicitly
converts between UTF8String and (Unicode)String without
dataloss and warnings. In this case one could also write a
slighty different version of UrlDecode() that returned a
UTF8String to save a few conversions.

--
Arno Garrels

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be



Se certifico que el correo entrante no contiene virus.
Comprobada por AVG - www.avg.es
Version: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la version: 
04/28/10 08:27:00

   


--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


[twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread Xavier Mor-Mur

Hi to all

I need to parse text to send via email if there are declared any file.
If text is html and saved from word processors or html editors all chars 
out of first 127 ASCII set are convert using utf8 convention.

What I need is recover original text to check if declared files exists.

First I was working with BCB5 but it have reduced support for unicode 
strings.

Now I'm using BC2009 but I'm walking around the same point.

Example of I get as parameter and what I need to test :
from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg
and I need Sin título1_html_m5b7e3440.jpg

I played with ansi, wide, unicode, utf8 variables and functions with no 
success.


Thanks in advance for your help.

--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread Matt Minnis
The %## codes are representations of the character.  It looks like a URL
encoding scheme to me.
For example:  %20 is the space character. The %C3%AD looks to be the letter
i with the accent.
Run it though a URL Decoder and see if that doesn’t get you what you are
looking for?

Matt

-Original Message-
From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On
Behalf Of Xavier Mor-Mur
Sent: Wednesday, April 28, 2010 14:19
To: ICS support mailing
Subject: [twsocket] help to convert from utf8 to ansi (locale)

Hi to all

I need to parse text to send via email if there are declared any file.
If text is html and saved from word processors or html editors all chars out
of first 127 ASCII set are convert using utf8 convention.
What I need is recover original text to check if declared files exists.

First I was working with BCB5 but it have reduced support for unicode
strings.
Now I'm using BC2009 but I'm walking around the same point.

Example of I get as parameter and what I need to test :
from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg
and I need Sin título1_html_m5b7e3440.jpg

I played with ansi, wide, unicode, utf8 variables and functions with no
success.

Thanks in advance for your help.

--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list please goto
http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread Xavier Mor-Mur

Thanks for the tip.

I tried it with no success.
URLEncode and URLDecode work apparently byte a byte and %C3%AD are 
converted as individual chars.

UTF8Encode and UTF8Decode I don't get it work.
certainly I'm doing something wrong.

Regards
Xavi

Al 28/04/2010 22:36, En/na Matt Minnis ha escrit:

The %## codes are representations of the character.  It looks like a URL
encoding scheme to me.
For example:  %20 is the space character. The %C3%AD looks to be the letter
i with the accent.
Run it though a URL Decoder and see if that doesn't get you what you are
looking for?

Matt

-Original Message-
From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On
Behalf Of Xavier Mor-Mur
Sent: Wednesday, April 28, 2010 14:19
To: ICS support mailing
Subject: [twsocket] help to convert from utf8 to ansi (locale)

Hi to all

I need to parse text to send via email if there are declared any file.
If text is html and saved from word processors or html editors all chars out
of first 127 ASCII set are convert using utf8 convention.
What I need is recover original text to check if declared files exists.

First I was working with BCB5 but it have reduced support for unicode
strings.
Now I'm using BC2009 but I'm walking around the same point.

Example of I get as parameter and what I need to test :
from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg
and I need Sin título1_html_m5b7e3440.jpg

I played with ansi, wide, unicode, utf8 variables and functions with no
success.

Thanks in advance for your help.

--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list please goto
http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be



Se certificó que el correo entrante no contiene virus.
Comprobada por AVG - www.avg.es
Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la versión: 
04/28/10 08:27:00

   


--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread Matt Minnis
Curiosity struck, so I googled it...
Apparently you aren't the only one with this issue.
If you have the string before it gets encoded in the 1st place you can
convert to UTF8 first, then encode it to URL so you can decode it properly.
If you don't have access to the string before encoding... I think you have a
problem and may have to handle on your own.  You might be able to detect the
1st as a high ascii value and use that to flag using both to combine as one.

If you do find/create a solution, let us know.  I'm sure we'll run across
needing something like that at some point...

Matt

-Original Message-
From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On
Behalf Of Xavier Mor-Mur
Sent: Wednesday, April 28, 2010 15:53
To: ICS support mailing
Subject: Re: [twsocket] help to convert from utf8 to ansi (locale)

Thanks for the tip.

I tried it with no success.
URLEncode and URLDecode work apparently byte a byte and %C3%AD are converted
as individual chars.
UTF8Encode and UTF8Decode I don't get it work.
certainly I'm doing something wrong.

Regards
Xavi

Al 28/04/2010 22:36, En/na Matt Minnis ha escrit:
 The %## codes are representations of the character.  It looks like a 
 URL encoding scheme to me.
 For example:  %20 is the space character. The %C3%AD looks to be the 
 letter i with the accent.
 Run it though a URL Decoder and see if that doesn't get you what you 
 are looking for?

 Matt

 -Original Message-
 From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] 
 On Behalf Of Xavier Mor-Mur
 Sent: Wednesday, April 28, 2010 14:19
 To: ICS support mailing
 Subject: [twsocket] help to convert from utf8 to ansi (locale)

 Hi to all

 I need to parse text to send via email if there are declared any file.
 If text is html and saved from word processors or html editors all 
 chars out of first 127 ASCII set are convert using utf8 convention.
 What I need is recover original text to check if declared files exists.

 First I was working with BCB5 but it have reduced support for unicode 
 strings.
 Now I'm using BC2009 but I'm walking around the same point.

 Example of I get as parameter and what I need to test :
 from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg
 and I need Sin título1_html_m5b7e3440.jpg

 I played with ansi, wide, unicode, utf8 variables and functions with 
 no success.

 Thanks in advance for your help.

 --
 Xavier Mor-Mur

 --
 To unsubscribe or change your settings for TWSocket mailing list 
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be

 --
 To unsubscribe or change your settings for TWSocket mailing list 
 please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
 Visit our website at http://www.overbyte.be



 Se certificó que el correo entrante no contiene virus.
 Comprobada por AVG - www.avg.es
 Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la 
 versión: 04/28/10 08:27:00



--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list please goto
http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread Xavier Mor-Mur

I'll will work on it as I need to solve.
Situation exposed is a test after report program was not working properly.
As I detected MS-Word encode partially and OpenOffice encode fully.

Regards


Al 28/04/2010 23:41, En/na Matt Minnis ha escrit:

Al 28/04/2010 22:36, En/na Matt Minnis ha escrit:
   

  The %## codes are representations of the character.  It looks like a
  URL encoding scheme to me.
  For example:  %20 is the space character. The %C3%AD looks to be the
  letter i with the accent.
  Run it though a URL Decoder and see if that doesn't get you what you
  are looking for?

  Matt

 


--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread RTT

This seems to work fine

UTF8Decode(URLDecode('Sin título1_html_m5b7e3440.jpg'));


Curiosity struck, so I googled it...
Apparently you aren't the only one with this issue.
If you have the string before it gets encoded in the 1st place you can
convert to UTF8 first, then encode it to URL so you can decode it properly.
If you don't have access to the string before encoding... I think you have a
problem and may have to handle on your own.  You might be able to detect the
1st as a high ascii value and use that to flag using both to combine as one.

If you do find/create a solution, let us know.  I'm sure we'll run across
needing something like that at some point...

Matt

-Original Message-
From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org] On
Behalf Of Xavier Mor-Mur
Sent: Wednesday, April 28, 2010 15:53
To: ICS support mailing
Subject: Re: [twsocket] help to convert from utf8 to ansi (locale)

Thanks for the tip.

I tried it with no success.
URLEncode and URLDecode work apparently byte a byte and %C3%AD are converted
as individual chars.
UTF8Encode and UTF8Decode I don't get it work.
certainly I'm doing something wrong.

Regards
Xavi

Al 28/04/2010 22:36, En/na Matt Minnis ha escrit:
   

The %## codes are representations of the character.  It looks like a
URL encoding scheme to me.
For example:  %20 is the space character. The %C3%AD looks to be the
letter i with the accent.
Run it though a URL Decoder and see if that doesn't get you what you
are looking for?

Matt

-Original Message-
From: twsocket-boun...@elists.org [mailto:twsocket-boun...@elists.org]
On Behalf Of Xavier Mor-Mur
Sent: Wednesday, April 28, 2010 14:19
To: ICS support mailing
Subject: [twsocket] help to convert from utf8 to ansi (locale)

Hi to all

I need to parse text to send via email if there are declared any file.
If text is html and saved from word processors or html editors all
chars out of first 127 ASCII set are convert using utf8 convention.
What I need is recover original text to check if declared files exists.

First I was working with BCB5 but it have reduced support for unicode
strings.
Now I'm using BC2009 but I'm walking around the same point.

Example of I get as parameter and what I need to test :
from file I get Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg
and I need Sin título1_html_m5b7e3440.jpg

I played with ansi, wide, unicode, utf8 variables and functions with
no success.

Thanks in advance for your help.

--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be



Se certificó que el correo entrante no contiene virus.
Comprobada por AVG - www.avg.es
Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la
versión: 04/28/10 08:27:00


 

--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list please goto
http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


   


--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread RTT
Obviously what works fine is the sequence of decode transformations on 
your encoded filename, not in the already decoded string as I posted. Sorry.

This is the correct example:
UTF8Decode(URLDEcode('Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg'));


This seems to work fine

UTF8Decode(URLDecode('Sin título1_html_m5b7e3440.jpg'));


--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread Xavier Mor-Mur

Hi RTT

Thanks for your tips. Finally I get it work but I should write code a 
bit different


String usStr; (or UnicodeString usStr;)
AnsiString asStr;

usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg );
asStr = usStr; --- Compiler introduce required conversion code

using
 asStr = UTF8Decode( usStr );
or
asSTR = UTF8Decode( URLDEcode( 
Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) );


give asStr = NULL
I think D2009 and BC2009 works diferent when doing inline automatic 
conversions.


Many thanks for all

Xavi


Al 29/04/2010 00:37, En/na RTT ha escrit:
Obviously what works fine is the sequence of decode transformations on 
your encoded filename, not in the already decoded string as I posted. 
Sorry.

This is the correct example:
UTF8Decode(URLDEcode('Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg'));


This seems to work fine

UTF8Decode(URLDecode('Sin título1_html_m5b7e3440.jpg'));


--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be



Se certificó que el correo entrante no contiene virus.
Comprobada por AVG - www.avg.es
Versión: 9.0.814 / Base de datos de virus: 271.1.1/2840 - Fecha de la versión: 
04/28/10 08:27:00

   


--
Xavier Mor-Mur

--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be


Re: [twsocket] help to convert from utf8 to ansi (locale)

2010-04-28 Thread RTT

Hi Xavier,
The UTF8Decode is being performed automatically probably because you are 
using the UrlDecode function from OverbyteIcsHttpSrv. If you check its 
definition,


function UrlDecode(const Url: string; SrcCodePage: LongWord = CP_ACP;  
DetectUtf8: Boolean = TRUE): string;


there is a, default set to true, parameter,  DetectUtf8, to define if 
the auto-detection, and decoding, of UTF8 encoded strings must be made 
automatically by the function after the URLDecode conversion



Hi RTT

Thanks for your tips. Finally I get it work but I should write code a 
bit different


String usStr; (or UnicodeString usStr;)
AnsiString asStr;

usStr = URLDEcode( Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg );
asStr = usStr; --- Compiler introduce required conversion code

using
 asStr = UTF8Decode( usStr );
or
asSTR = UTF8Decode( URLDEcode( 
Sin%20t%C3%ADtulo%201_html_m5b7e3440.jpg ) );


give asStr = NULL
I think D2009 and BC2009 works diferent when doing inline automatic 
conversions.


Many thanks for all

Xavi



--
To unsubscribe or change your settings for TWSocket mailing list
please goto http://lists.elists.org/cgi-bin/mailman/listinfo/twsocket
Visit our website at http://www.overbyte.be