Re: [fpc-devel] utf8 in 2.6.0
On 5 January 2013 13:39, Mattias Gaertner nc-gaert...@netcologne.de wrote: On Sat, 5 Jan 2013 13:06:42 + Frank Church vfcli...@gmail.com wrote: [...] It is obvious that Unicode is not a simple topic and among FPC/Lazarus developers/contributors,I suspect that few if any at all, have a detailed grasp of how it all hangs together in the current state of implementation. It brings to mind the parable of the 12 blind men and the elephant. The FPC and Lazarus UTF details are not that difficult. The complexity comes from adding Delphi *, third party libraries and old FPC, Lazarus versions. I think a diagram or graph of Unicode rules and their current state of implementation in FPC/Lazarus would go a long way to helping both developers and end users in this area. It is a topic which comes up regularly and it doesn't show signs of ever going to be properly resolved. For Lazarus: - works with fpc 2.6.x and 2.7.1 - LCL and most code expect ansistrings to hold UTF-8. - pascal sources, lfm, po files are stored in UTF-8 without BOM. Special care has to be taken, when using widestrings/unicodestring. - there are UTF-8 functions and classes (most in package lazutils). - the IDE supports many encodings - all this is documented via wiki and fpdoc - no support for UTF-16 has been started [...] Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Glad to hear this. -- Frank Church === http://devblog.brahmancreations.com ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
Martin Schreiber schrieb: but I fear we can not use that information for development with Free Pascal because: The string is represented internally as a Unicode string encoded as UTF-16. Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters not in the BMP require 4 bytes. and A control string is a sequence of one or more control characters, each of which consists of the # symbol followed by an unsigned integer constant from 0 to 65,535 (decimal) or from $0 to $ (hexadecimal) in UTF-16 encoding, and denotes the character corresponding to a specified code value. Each integer is represented internally by 2 bytes in the string. This is useful for representing control characters and multibyte characters. which seems to be different from Free Pascal. Correction: You're right, Delphi treats control characters as UTF-16 codes, where FPC treats them as byte values (if less than 256). I noticed the possible problem already, that the FPC interpretation of control characters is context sensitive. This leads to write-only code, because a change of the $codepage would require to change all control characters in that unit accordingly. This in addition to the removal or addition of control characters 255, which also lead to a different interpretation of the remaining control characters *and* to a different internal representation. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 05 Jan 2013, at 10:29, Martin Schreiber wrote: Are these stupid questions? No, but I seem to be unable to explain how it works since you keep asking about things I already tried to explain before, but I clearly failed to do properly. I can keep repeating myself, but I'm not sure whether that will help anyone. For example, I said that basically nothing changed in 2.7.x compared to 2.6.x, except that certain string constants are no longer automatically converted to utf-16 at compile time, and then you ask Or should we not touch the theme strings and FPC anymore?. Since basically nothing changed except for a few less blind auto-conversions at compile time, why should you no longer be able to touch anything anymore? Let me repeat: your string constants will be parsed by the compiler into character sequences with exactly the same content in both 2.6.x and 2.7.x (and with content I mean that if they would be converted to the same code page in 2.6.x and in 2.7.x, you would end up with exactly the same binary data). Whether or not they contain character literals whose value is #127 in the source code's code page, or explicit #xx, #xxx etc expressions has no influence, nothing changed in the compiler in that account. The *only* difference is that the compiler can now internally represent ansistrings with arbitrary code pages, and as a result the aforementioned character sequences may now be stored internally in the compiler in a different format, and also stored in the program in a different format if that can avoid conversions at run time. In particular, character sequences are no longer all converted immediately/by default/under all circumstances to UTF-16 in case characters #127 need to be interpreted according to a particular code page (i.e., if a {$codepage xxx} directive is present). The compiler will now only convert such character sequences to UTF-16, still at compile time (just like it was able to do in 2.6.x), if it is actually assigned to an UTF-16-encoded string, passed to an UTF-16 parameter etc. And the compiler will also convert it to another ansistring code page is case the character sequence appeared in e.g. a file with {$codepage utf-8} and is then assigned to a variable whose type is declared as type ansistring(850). Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 05.01.2013 11:30, Jonas Maebe wrote: For example, I said that basically nothing changed in 2.7.x compared to 2.6.x, except that certain string constants are no longer automatically converted to utf-16 at compile time, and then you ask Or should we not touch the theme strings and FPC anymore?. Since basically nothing changed except for a few less blind auto-conversions at compile time, why should you no longer be able to touch anything anymore? I think it was more meant in the context of the mailing list instead of a technical context. Like in we had this topic a thousand times and maybe it's better we shut up about it now before we get moderated. Though I could not see where you, Jonas seemed upset about Martin's questions... (and in my opinion the answers cleared up many things - at least for me :) ) Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Saturday 05 January 2013 11:42:29 Sven Barth wrote: On 05.01.2013 11:30, Jonas Maebe wrote: For example, I said that basically nothing changed in 2.7.x compared to 2.6.x, except that certain string constants are no longer automatically converted to utf-16 at compile time, and then you ask Or should we not touch the theme strings and FPC anymore?. Since basically nothing changed except for a few less blind auto-conversions at compile time, why should you no longer be able to touch anything anymore? I think it was more meant in the context of the mailing list instead of a technical context. Like in we had this topic a thousand times and maybe it's better we shut up about it now before we get moderated. Correct. :-) Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 05 Jan 2013, at 12:16, Martin Schreiber wrote: On Saturday 05 January 2013 11:42:29 Sven Barth wrote: I think it was more meant in the context of the mailing list instead of a technical context. Like in we had this topic a thousand times and maybe it's better we shut up about it now before we get moderated. Correct. :-) Then maybe I should just stop completely answering any questions about this, because apparently not answering completely enough to your liking gets interpreted as telling you to shut up or getting moderated. Just like Sven I don't understand where this interpretation comes from, and I strongly resent it. I didn't answer because I thought the information was all in my previous answers already, and if someone else felt they could clarify it better than I did, they were free to do so. My time is also finite, and trying to get me to elaborate further by getting my on my high horse because I feel I'm being misrepresented, is something that will not work very well in the long term. It will much more likely result in silence than in more help. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Saturday 05 January 2013 12:39:21 Jonas Maebe wrote: Then maybe I should just stop completely answering any questions about this, because apparently not answering completely enough to your liking gets interpreted as telling you to shut up or getting moderated. Just like Sven I don't understand where this interpretation comes from, and I strongly resent it. I didn't answer because I thought the information was all in my previous answers already, and if someone else felt they could clarify it better than I did, they were free to do so. My time is also finite, and trying to get me to elaborate further by getting my on my high horse because I feel I'm being misrepresented, is something that will not work very well in the long term. It will much more likely result in silence than in more help. No, no, you understand me wrong. I am merely cautious not to annoy the FPC team, please accept my apology, but I need to decide if FPC is still the right tool for my purposes. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Sat, 5 Jan 2013, Martin Schreiber wrote: On Saturday 05 January 2013 12:39:21 Jonas Maebe wrote: Then maybe I should just stop completely answering any questions about this, because apparently not answering completely enough to your liking gets interpreted as telling you to shut up or getting moderated. Just like Sven I don't understand where this interpretation comes from, and I strongly resent it. I didn't answer because I thought the information was all in my previous answers already, and if someone else felt they could clarify it better than I did, they were free to do so. My time is also finite, and trying to get me to elaborate further by getting my on my high horse because I feel I'm being misrepresented, is something that will not work very well in the long term. It will much more likely result in silence than in more help. No, no, you understand me wrong. I am merely cautious not to annoy the FPC team, please accept my apology, but I need to decide if FPC is still the right tool for my purposes. Seeing that you have already invested lots of time in FPC, you could also ask yourself 'How can I help improve fpc so it remains the right tool for my purposes' ? Or have you decided that cooperation with the FPC team is an impossibility ? Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Saturday 05 January 2013 13:01:44 Michael Van Canneyt wrote: Seeing that you have already invested lots of time in FPC, you could also ask yourself 'How can I help improve fpc so it remains the right tool for my purposes' ? Or have you decided that cooperation with the FPC team is an impossibility ? It is not easy mainly because the mission goal is so broad. And a division of work probably would be the better solution. I make my job to build a highly productive open source development environment for Free Pascal and the FPC team makes a compiler which allows to build such a tool. But now we are off topic. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
Martin Schreiber schrieb: but I fear we can not use that information for development with Free Pascal because: The string is represented internally as a Unicode string encoded as UTF-16. Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters not in the BMP require 4 bytes. and A control string is a sequence of one or more control characters, each of which consists of the # symbol followed by an unsigned integer constant from 0 to 65,535 (decimal) or from $0 to $ (hexadecimal) in UTF-16 encoding, and denotes the character corresponding to a specified code value. Each integer is represented internally by 2 bytes in the string. This is useful for representing control characters and multibyte characters. which seems to be different from Free Pascal. Where do you see a difference? The strings are stored in UTF-16, which is the same in every implementation, regardless of (possibly) different more verbose descriptions. The new AnsiStrings are safe against misinterpretation, because they contain their encoding (codepage). Every char in an AnsiString now can be converted to one and only one Unicode char, when needed. This is not true for single AnsiChars, which still have no codepage information stored with them (in both Delphi and FPC). I strongly discourage the use of Char variables in all flavours (Char, AnsiChar, WideChar), because these are incapable of holding all possible Unicode characters. Only UnicodeChar or UCS4Char (if these exist) can hold all possible character codes, without possible codepage misinterpreation. The discussion mostly covers the compilation of string *literals*, like 'äöü' or #123, for which every compiler tries to find the best interpretation and internal representation. FPC has a $codepage directive, which tells the compiler that *all* literals in this unit shall be treated as strings of that codepage. This is essential for files stored as Ansi, which have no information about the codepage of the contained single-byte characters. Files stored with UTF-8 encoding, and an UTF-8 BOM at their begin, are safe against misinterpretation. When the compiler translates the source code string literals, it can store them either as Unicode (UTF-16) or as AnsiString of the given $codepage, depending on the *use* of the literal (type of the string variable in an assignment). This will reduce the number of implicit string conversions at runtime. [Please correct me if I'm wrong] DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
For Lazarus: - works with fpc 2.6.x and 2.7.1 - LCL and most code expect ansistrings to hold UTF-8. - pascal sources, lfm, po files are stored in UTF-8 without BOM. Special care has to be taken, when using widestrings/unicodestring. - there are UTF-8 functions and classes (most in package lazutils). - the IDE supports many encodings - all this is documented via wiki and fpdoc - no support for UTF-16 has been started Your summary sounds clear to me, and the strategy selected looks to be well designed, I will bookmark this summary as the reference. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 15 Dec 2012, at 19:35, Martin wrote: I am trying to figure out how to do that, or what I do wrong. I found a page about $codepage, but it did not help http://wiki.freepascal.org/LCL_Unicode_Support I didnt find the fpc specific page, if exists (I suspect it does) I am calling a function function Foo(A:string) {$mode objfpc}{$H+} I call it with a constant, that contains an german umlaut. Checked with a hex editor, the constant in the source file is utf8 - If I save the source (in utf8), without a utf8 BOM, then it works fine on windows. - If I had a bom, then the string received by the function appears to be ascii (checked memory dump in debugger oe becomes d6 - if I add {$codepage utf8} it also becomes ascii If I do *not* add that, it seems something gos wrong with the encoding on a PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info. If I add it things also go wrong, only different. Again no more info. --- I know the provided info, is very little. There is not enough information to be able to give an answer. Source code, source code, source code. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01/01/2013 13:18, Jonas Maebe wrote: On 15 Dec 2012, at 19:35, Martin wrote: I am trying to figure out how to do that, or what I do wrong. I found a page about $codepage, but it did not help http://wiki.freepascal.org/LCL_Unicode_Support I didnt find the fpc specific page, if exists (I suspect it does) I am calling a function function Foo(A:string) {$mode objfpc}{$H+} I call it with a constant, that contains an german umlaut. Checked with a hex editor, the constant in the source file is utf8 - If I save the source (in utf8), without a utf8 BOM, then it works fine on windows. - If I had a bom, then the string received by the function appears to be ascii (checked memory dump in debugger oe becomes d6 - if I add {$codepage utf8} it also becomes ascii If I do *not* add that, it seems something gos wrong with the encoding on a PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info. If I add it things also go wrong, only different. Again no more info. --- I know the provided info, is very little. There is not enough information to be able to give an answer. Source code, source code, source code. Problem is, the original issue does not happen on my hardware. It is about an issue with the test case in components/EditorMacroScript. But only happens on powerPC hardware. I only have/had an extract of the results. From the looks ofwhat went wrong, and what was the output (calculating char positions) utf8 coding was/is a strong suspect (not confirmed though. On my hardware it is normally all fine, but fails if I add the $codepage. I could spent a lot of work boiling that down to a sample. But given that I couldn't even find the docs what I really should expect, and therefore might be doing something wrong, I thought I go first looking for what should happen. Add {$codepage utf8} on top of lazarus\components\macroscript\test\testscriptprocs.pas and the behaviour changes so that the test will fail. (there are utf8 constants in the source, and it appears, that with the $codepage the called code does NOT get that utf8 string, but something else instead. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01 Jan 2013, at 15:14, Martin wrote: On my hardware it is normally all fine, but fails if I add the $codepage. I could spent a lot of work boiling that down to a sample. But given that I couldn't even find the docs what I really should expect, Without a {$codepage xxx} directive, string constants containing characters #127 remain exactly as they appear in the source code. With a {$codepage xxx} directive, string constants containing characters #127 are converted into unicodestrings during the parsing (according to the specified code page), and converted back into ansistrings (using the ansi code page of that particular program run) at run time if they are assigned to ansistring/shortstrings or passed to routines expecting such parameters. Note that the above is for 2.6.x (as the subject mentions). Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01/01/2013 14:24, Jonas Maebe wrote: On 01 Jan 2013, at 15:14, Martin wrote: On my hardware it is normally all fine, but fails if I add the $codepage. I could spent a lot of work boiling that down to a sample. But given that I couldn't even find the docs what I really should expect, Without a {$codepage xxx} directive, string constants containing characters #127 remain exactly as they appear in the source code. With a {$codepage xxx} directive, string constants containing characters #127 are converted into unicodestrings during the parsing (according to the specified code page), and converted back into ansistrings (using the ansi code page of that particular program run) at run time if they are assigned to ansistring/shortstrings or passed to routines expecting such parameters. Note that the above is for 2.6.x (as the subject mentions). ok, leaves me with my original problem. On said ppc, using the original file (no codepage directive). file should be identical (svn checkout) (yet on 2nd thought I can't be sure, that it wasn't open in an editor and saved with utf8 bom...). So on that ppc something goes wrong. From the feedback I had, it looks exactly as if the encoding of the constant was changed. So what I was looking for was a way to 100% prevent that. Something that tells the compiler: Whatever encoding you find or expect or whatever encoding the output should be, do not touch strings. just take them byte by byte. Does that exist? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01 Jan 2013, at 15:40, Martin wrote: ok, leaves me with my original problem. On said ppc, using the original file (no codepage directive). file should be identical (svn checkout) (yet on 2nd thought I can't be sure, that it wasn't open in an editor and saved with utf8 bom...). So on that ppc something goes wrong. From the feedback I had, it looks exactly as if the encoding of the constant was changed. So what I was looking for was a way to 100% prevent that. Something that tells the compiler: Whatever encoding you find or expect or whatever encoding the output should be, do not touch strings. just take them byte by byte. Does that exist? As mentioned in my previous reply: if you don't use the codepage directive, then the compiler won't change anything. If you assign the string constant to a unicodestring or pass it as such a parameter, it will of course still be converted from ansi to utf-16 at run time. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Tuesday 01 January 2013 15:24:05 Jonas Maebe wrote: On 01 Jan 2013, at 15:14, Martin wrote: On my hardware it is normally all fine, but fails if I add the $codepage. I could spent a lot of work boiling that down to a sample. But given that I couldn't even find the docs what I really should expect, Without a {$codepage xxx} directive, string constants containing characters #127 remain exactly as they appear in the source code. With a {$codepage xxx} directive, string constants containing characters #127 are converted into unicodestrings during the parsing (according to the specified code page), and converted back into ansistrings (using the ansi code page of that particular program run) at run time if they are assigned to ansistring/shortstrings or passed to routines expecting such parameters. Note that the above is for 2.6.x (as the subject mentions). How does it work in trunk? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01 Jan 2013, at 16:31, Martin Schreiber wrote: On Tuesday 01 January 2013 15:24:05 Jonas Maebe wrote: Without a {$codepage xxx} directive, string constants containing characters #127 remain exactly as they appear in the source code. With a {$codepage xxx} directive, string constants containing characters #127 are converted into unicodestrings during the parsing (according to the specified code page), and converted back into ansistrings (using the ansi code page of that particular program run) at run time if they are assigned to ansistring/shortstrings or passed to routines expecting such parameters. Note that the above is for 2.6.x (as the subject mentions). How does it work in trunk? The strings are stored as ansistrings with the appropriate code page. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Tuesday 01 January 2013 16:44:28 Jonas Maebe wrote: On 01 Jan 2013, at 16:31, Martin Schreiber wrote: On Tuesday 01 January 2013 15:24:05 Jonas Maebe wrote: Without a {$codepage xxx} directive, string constants containing characters #127 remain exactly as they appear in the source code. With a {$codepage xxx} directive, string constants containing characters #127 are converted into unicodestrings during the parsing (according to the specified code page), and converted back into ansistrings (using the ansi code page of that particular program run) at run time if they are assigned to ansistring/shortstrings or passed to routines expecting such parameters. Note that the above is for 2.6.x (as the subject mentions). How does it work in trunk? The strings are stored as ansistrings with the appropriate code page. So UnicodeStringVariable:= 'abcdäüö'; always will call a conversion function? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Tuesday 01 January 2013 16:54:21 Martin Schreiber wrote: So UnicodeStringVariable:= 'abcdäüö'; always will call a conversion function? And how works {$codepage 8859-1} ... UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074; ? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01 Jan 2013, at 16:54, Martin Schreiber wrote: On Tuesday 01 January 2013 16:44:28 Jonas Maebe wrote: The strings are stored as ansistrings with the appropriate code page. So UnicodeStringVariable:= 'abcdäüö'; always will call a conversion function? The assignment node will insert a type conversion of the right hand side to unicodestring. In 2.6.x, the right hand side will already be a unicodestring and nothing will happen. In 2.7.x, the type conversion node will be simplified into a unicodestring constant because it is a typeconversion of a constant (just like int64(1) is also handled at compile time). And how works {$codepage 8859-1} ... UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074; ? That string contains codepoints #255 and hence is a unicodestring rather than a single byte string. No conversion at either compile or run time happens, and the codepage directive has no influence. Jonas___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
Thanks, another question, or is the behavior already documented? UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074; ? That string contains codepoints #255 and hence is a unicodestring rather than a single byte string. No conversion at either compile or run time happens, and the codepage directive has no influence. {$codepage utf8} ... UnicodeStringVar:= 'abcd'#228#252#246; Does it store 'abcdäüö' in trunk? Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 01 Jan 2013, at 17:51, Martin Schreiber wrote: Thanks, another question, or is the behavior already documented? What you are asking about has always been the same. I don't know to what extent it is documented. {$codepage utf8} ... UnicodeStringVar:= 'abcd'#228#252#246; Does it store 'abcdäüö' in trunk? I have no idea how anything I wrote suggests that it wouldn't. As mentioned, the only difference is that string constants containing characters #127 are no longer always converted to unicodestring constants at compile time. They are ansistring constants with the appropriate code page by default, and hence are only converted (at compile, since they are constants) to a different string type/code page when required. Jonas ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On Tuesday 01 January 2013 18:00:59 Jonas Maebe wrote: I have no idea how anything I wrote suggests that it wouldn't. As mentioned, the only difference is that string constants containing characters #127 are no longer always converted to unicodestring constants at compile time. They are ansistring constants with the appropriate code page by default, and hence are only converted (at compile, since they are constants) to a different string type/code page when required. So #n or #nn or #nnn or # or #n always means Unicode codepoint and will be at compiletime converted to an 8bit character sequence depending on {$codepage} and stored in a cpstrnew with the codepage of {$codepage} if assigned to a cpstrnew variable? And if the constant is assigned to a UnicodeString variable the Unicode codepoints are converted and stored to a utf-16 16bit character sequence at compiletime independent if they contain codepoints 255? Has somebody a link to Embarcadero documentation about the matter? I assume FPC trunk does exactly the same as Delphi XE3 with strings? Thanks for your patience, Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
Jonas Maebe schrieb: And how works {$codepage 8859-1} ... UnicodeStringVar:= 'abcd'#228#246#252#1092#1080#1089#1074; ? That string contains codepoints #255 and hence is a unicodestring rather than a single byte string. No conversion at either compile or run time happens, and the codepage directive has no influence. Does this really mean that, when the codes #255 are removed, the remaining codes have a different meaning? DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 15.12.2012 21:35, Martin wrote: I am trying to figure out how to do that, or what I do wrong. I found a page about $codepage, but it did not help http://wiki.freepascal.org/LCL_Unicode_Support I didnt find the fpc specific page, if exists (I suspect it does) I am calling a function function Foo(A:string) {$mode objfpc}{$H+} I call it with a constant, that contains an german umlaut. Checked with a hex editor, the constant in the source file is utf8 - If I save the source (in utf8), without a utf8 BOM, then it works fine on windows. - If I had a bom, then the string received by the function appears to be ascii (checked memory dump in debugger oe becomes d6 - if I add {$codepage utf8} it also becomes ascii If I do *not* add that, it seems something gos wrong with the encoding on a PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info. If I add it things also go wrong, only different. Again no more info. --- I know the provided info, is very little. If there is anything obvious then tell me. Thanks ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Probably this is due to significant change in FPC 2.7 RTL *String* type implies the encoding inside under WIndows it is ANSI by default. Try to write simple application that concatenates (s:=a+b) two strings with umlauted letters. The resulting string loose the umlauts under Windows. The only thing that help at the RTL level - {$ifdef FPC} SetMultiByteConversionCodePage(CP_UTF8); {$endif} This brings similar behaviour for RTL functions ether in Windows and UNIX but completely breaks file IO. You wont be able to open file which names translates to more-than-one-byte per symbol. because RTL IO is ANSI-specific under Windows. Other approach - use the *UnicodeString*. Forget the *string* type. regards, Anton ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
Am 18.12.2012 14:47, schrieb Anton Kavalenka: On 15.12.2012 21:35, Martin wrote: I am trying to figure out how to do that, or what I do wrong. I found a page about $codepage, but it did not help http://wiki.freepascal.org/LCL_Unicode_Support I didnt find the fpc specific page, if exists (I suspect it does) I am calling a function function Foo(A:string) {$mode objfpc}{$H+} I call it with a constant, that contains an german umlaut. Checked with a hex editor, the constant in the source file is utf8 - If I save the source (in utf8), without a utf8 BOM, then it works fine on windows. - If I had a bom, then the string received by the function appears to be ascii (checked memory dump in debugger oe becomes d6 - if I add {$codepage utf8} it also becomes ascii If I do *not* add that, it seems something gos wrong with the encoding on a PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info. If I add it things also go wrong, only different. Again no more info. --- I know the provided info, is very little. If there is anything obvious then tell me. Thanks ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel Probably this is due to significant change in FPC 2.7 RTL *String* type implies the encoding inside under WIndows it is ANSI by default. Martin's question is related to 2.6.0 (see his mail's subject) not 2.7.1. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
[fpc-devel] utf8 in 2.6.0
I am trying to figure out how to do that, or what I do wrong. I found a page about $codepage, but it did not help http://wiki.freepascal.org/LCL_Unicode_Support I didnt find the fpc specific page, if exists (I suspect it does) I am calling a function function Foo(A:string) {$mode objfpc}{$H+} I call it with a constant, that contains an german umlaut. Checked with a hex editor, the constant in the source file is utf8 - If I save the source (in utf8), without a utf8 BOM, then it works fine on windows. - If I had a bom, then the string received by the function appears to be ascii (checked memory dump in debugger oe becomes d6 - if I add {$codepage utf8} it also becomes ascii If I do *not* add that, it seems something gos wrong with the encoding on a PowerPC Mac. Unfortunately this is someone else's pc, and I have no more info. If I add it things also go wrong, only different. Again no more info. --- I know the provided info, is very little. If there is anything obvious then tell me. Thanks ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] utf8 in 2.6.0
On 15.12.2012 19:35, Martin wrote: I am trying to figure out how to do that, or what I do wrong. I found a page about $codepage, but it did not help http://wiki.freepascal.org/LCL_Unicode_Support I didnt find the fpc specific page, if exists (I suspect it does) The page is this: http://wiki.freepascal.org/FPC_Unicode_support though it's rather outdated... :( Otherwise I can not help you, but it has definitely something to do with the different code page handling in 2.6.0 compared to 2.7.1. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel