Re: [fpc-devel] String and UnicodeString and UTF8String
On Wednesday, 12. January 2011 23.05:02 Juha Manninen wrote: Martin Schreiber kirjoitti maanantai 10 tammikuu 2011 19:22:49: On Monday, 10. January 2011 16.27:19 Marco van de Voort wrote: And there are three such cases - normal FPC and Delph 2007- code : ansistring(0) - Lazarus : ansistring=utf8 - Delphi 2009+ UTF16. - fpGUI: ansistring = utf-8 - MSEgui: existing FPC UnicodeString = utf-16 Without studying your code myself I guess you had to make many utility functions and classes yourself for UTF-16 ? Even the normal TStringList doesn't work. Correct. MSEgui has a complete development environment for UnicodeString with an own set of lists, streams, file and directory functions and the like. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Didn't I explain this to you and others a few times? ;-) If so, then please excuse me The database-components itself are encoding-agnostic. This means: encoding in = encoding out. So it is up to the developer what codepage he want to use. So TField.Text can have the encoding _you_ want. So, if you want to work with Lazarus, which uses UTF-8, you have to use UTF-8 encoded strings in your database. So this is answer, which i have looked for: In Lazarus TStringField MUST hold UTF-8 encoded strings. But I guess (I have theory), that in time, when Borland introduced TStringField, the design goal was: TStringField was designed for SBCS (because DataSize=Size+1) string data encoded in system ANSI code page and TWideStringField was designed for DBCS widestring (UTF-16) character data May be, that I was mistaken by this view. (or may be, that there is different approach in Delphi (no agnostic) and different in FPC (agnostic)?) If there is some strange reason why you don't want the strings in your database to be UTF-8 encoded, SQL Server does not support UTF-8 (AFAIK) SQL Server provides non-UNICODE datatypes - char, varchar, text and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext you have to convert the strings from the encoding your database uses to UTF-8 while reading data from the database. Luckily, you can specify the encoding of strings you want to use for most databases. Not only the encoding in which the strings are stored, but also the encoding which has to be used when you send and retrieve data from the database. And you can set this for each connection made. Ie: you can resolve the problem by changing the connection-string, or by adding some connection-parameter. Yes, it is true for example for MySQL or Firebird ODBC driver, but for SQL Server or PostgreSQL ODBC driver there are no such options (but PostgreSQL ODBC driver exists in ANSI and UNICODE version) SQL Server ODBC driver supports AutoTranslate, see: http://msdn.microsoft.com/en-us/library/ms130822.aspx SQL Server *char*, *varchar*, or *text* data sent to a client SQL_C_CHAR variable is converted from character to Unicode using the server ACP, then converted from Unicode to character using the client ACP. There's also another solution you can find on the forum and other places. You can convert the strings to UTF-8 not only when they are read from the database, but also when they are read from the internal memory. There's a hook for that. Thanks for your patience -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Joost van der Sluis wrote / nap?sal(a): On Wed, 2011-01-12 at 14:59 +0100, LacaK wrote: No. It is mandatory that you send/receive UTF8 to/from GUI LCL elements. As LCL elements are using TStringField.Text property, then this property should return UTF8String, right (not AnsiString in ANSI code page) ? If yes, then also TStringField must store internaly data in any unicode format (to not lose any characters), right ? So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate space 4*[max.number of characters in field], right ? So in what encoding are string data stored now in TStringField ? The encoding you've specified. In the connection-string or some other database-server dependent setting. ok. But then there is problem in buffer size allocated for TStringField (ftString), does not ? See please at bug report: http://bugs.freepascal.org/view.php?id=17376 There is described situation with SQLite (TSQLite3Connectin) , which returns UTF-8 strings, so there is no problem in encoding, but problem is in fact, that for char(n),varchar(n) fields is created TStringField with Size=n and in record buffer is also allocated space with Size+1, where n is number of characters (not bytes). So truncation of data occurs, when writting UTF-8 encoded string into record buffer. So IMHO there must be: 1. allocated space in record buffer in size 4*TFieldDef.Size+1 (and so on) or 2. redefine meaning of Size property (as number of bytes not characters) and create fielddefs with Size*4 hm, according to http://docwiki.embarcadero.com/VCL/XE/en/DB.TStringField.Size is Size number of characters but according to http://docwiki.embarcadero.com/VCL/en/DB.TFieldDef.Size is Size number of bytes in underlaying database but TField is created from TFieldDef and TField.Size=TFieldDef.Size ... so isn't it curious ? Not that when you want to use UTF-16 (or 32) you have to use TWideStringFields. So TWideStringField is no-encoding-agnostic field (is it designed to be everytime UTF-16 encoded) ? -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String
L Yes in UNIX world it may be so (I do not know), L but in Windows ODBC we have no such possibility AFAIK Quote from Microsoft: The ODBC 3.5 (or higher) Driver Manager supports both ANSI and Unicode versions of all functions that accept pointers to character strings or SQLPOINTER in their arguments. The Unicode functions are implemented as functions (with a suffix of W), not as macros. The ANSI functions (which can be called with or without a suffix of A) are identical to the current ODBC API functions. ODBC 3.5 was launched around 2000-2001. But this approach will require changes in packages/odbc/src/odbcsql.inc like, does not ?: -pointer(SQLGetData) := GetProcedureAddress(ODBCLibraryHandle,'SQLGetData'); +pointer(SQLGetData) := GetProcedureAddress(ODBCLibraryHandle,'SQLGetDataW'); And I do not know how it affect compatibility for example in UNIX or if all ODBC drivers support this functionality. But also in this case we will get UTF-16 widestrings (in Windows) not UTF-8, does not ? -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Hello FPC, Thursday, January 13, 2011, 10:03:02 AM, you wrote: ODBC 3.5 was launched around 2000-2001. L But this approach will require changes in packages/odbc/src/odbcsql.inc L like, does not ?: L -pointer(SQLGetData) := L GetProcedureAddress(ODBCLibraryHandle,'SQLGetData'); L +pointer(SQLGetData) := L GetProcedureAddress(ODBCLibraryHandle,'SQLGetDataW'); L And I do not know how it affect compatibility for example in UNIX or if L all ODBC drivers support this functionality. Most probably it needs a flag to indicate that the ODBC must work in Unicode, and then dynamic link to *W functions if this flag is set. I think ODBC drivers since 2002+/- should have this set of APIs, but I had never used ODBC in my life... :) L But also in this case we will get UTF-16 widestrings (in Windows) not L UTF-8, does not ? That's not important, you get unicode in the specified by the API format, then SQLConnector fills information in the expected target format (WideString, UTF8String over AnsiString, Raw bytes...). -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Dwar2 changed?
Joost van der Sluis wrote: On Wed, 2011-01-12 at 23:52 +, Martin wrote: Has dwarf 2 changed ? TCmdLineDebugger.SendCmdLn -data-evaluate-expression ^^shortstring(^POINTER($eax)^+12)^^ TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times TCmdLineDebugger.ReadLn (gdb) You do realize that this is an hack? (I partly wrote it) It looks much like I first wrote it :) Anyway, without rtl debug info this is the only way to retrieve the classname of the exception object. It could also be that the location of the exception-name has been changed by something. This hack doesn't use any debug-information. Only the definitions of a shortstring and pointer. I don't think that the exception name location is changed, it would mean that the VMT layout has changed. Marc ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8String
In our previous episode, Hans-Peter Diettrich said: non-native strings, it can also be a performance win). IMO a single encoding, i.e. UTF-8, can cover all cases. Well, for starters, it doesn't cover the existing Delphi/unicode codebase. Because it's bound to UTF-16? That's not a problem, because WideString will continue to exist, and according conversions are still inserted by the compiler. That is DIY compatibility, or, in other words, no compaibility. Widestring will also grind the application to a halt due to being COM based on Windows. While some hard core Ansi coders may whine about such a convention, the absence of implicit string conversions (except in external library calls) will make such applications more performant than mixed-encoding versions. I don't see why this is the case. A current system encoding application does not do any conversion. (except for GUI output, and that can be considered negiable to the actual GUI overhead) When system encoding changes with the target platform, indexed access to such strings can lead to different results. Unless the compiler can read the coder's mind... You don't have to. The Delphi model provides a stringtype for the system encoding, and then as such all strings from the system can be labeled. With other stringtypes, the necessary conversions can be edited. Likewise, e.g. win32 console routines can be labeled with OEMString. (Since windows uses a different default encoding for the console) Why spend time in the design of multiple RTL/LCL versions, when a single version will be perfectly sufficient? Why spent 13 years being compatible when you can throw it away in a second? It's sufficient to throw away what's no more needed :-) The previous message from Jeff shows that even shortstring is still in major production use. Nothing is unused and can be clipped without a long winded transition, or Delphi 2009 like painful breaks. Moreover, these discussions are useless since you know as well as I do that no one stringtype will ever satisfy everybody. So IMHO it is time to take the consequences from the 500 posts on this subject on the unicode subject on this and other FPC/Lazarus lists and start thinking in solutions to manage that, instead of reiterating the one type to rule them all mantra ad infinitum. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] += with properties
On 01/13/2011 02:25 AM, Hans-Peter Diettrich wrote: This would result in the same error, because x.a is not an lval. The example that make me ask here was Form1.Caption := Form1.Caption + '.'; -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Dwar2 changed?
On 13/01/2011 07:45, Joost van der Sluis wrote: TCmdLineDebugger.SendCmdLn -data-evaluate-expression ^^shortstring(^POINTER($eax)^+12)^^ TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times TCmdLineDebugger.ReadLn (gdb) You do realize that this is an hack? (I partly wrote it) It could also be that the location of the exception-name has been changed by something. This hack doesn't use any debug-information. Only the definitions of a shortstring and pointer. I do, yes... But: - I know eax is correct, because the fall-back (using ^char instead of shortstring works: -data-evaluate-expression ^char(^pointer(^POINTER($eax)^+12)^+1) - The fallback is usually needed, if shortstring is not in the symboltable at all, but then the expression gives an error. Now the expression returns data, but the wrong data... Strange, the same fpc on windows still works perfect with half a dozen different fpc versions (6.3 to 7.2) It is also possible that my previous fpc trunk on my fedora box was build with some debug ino, and now I forgot, and just build it... but again on windows I tested with fpc 2.4.2 and trunc, both build with several different configs... Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Most probably it needs a flag to indicate that the ODBC must work in Unicode, and then dynamic link to *W functions if this flag is set. I think ODBC drivers since 2002+/- should have this set of APIs, but I had never used ODBC in my life... :) It seems, that Driver Manager automaticaly performs conversions on non-unicode driver. See http://web.datadirect.com/resources/odbc/unicode/odbc-driver.html ( and http://web.datadirect.com/resources/odbc/unicode/char-background.html ) If the driver is a non-Unicode driver, it cannot understand W function calls, and the Driver Manager must convert them to ANSI calls before sending them to the driver. Also it seems to me, that when you call ANSI version of ODBC API functions, then you receive data in ANSI encoding. If it is so, then it is always safe use ansitoutf8() (or UTF8Encode()) on receved data. L But also in this case we will get UTF-16 widestrings (in Windows) not L UTF-8, does not ? That's not important, you get unicode in the specified by the API format, then SQLConnector fills information in the expected target format (WideString, UTF8String over AnsiString, Raw bytes...). In Windows we get UTF-16, in Linux/UNIX we get UTF-8 So it is so, that in Windows is widestring=UTF-16 and in Linux/UNIX is widestring=UTF-8 string ? (So there is different meaning of widestring type on different OSeses ? I am only Windows developer, so I have no understanding of others OSeses ;-)) -Laco. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Hello FPC, Thursday, January 13, 2011, 1:01:57 PM, you wrote: L Also it seems to me, that when you call ANSI version of ODBC API L functions, then you receive data in ANSI encoding. L If it is so, then it is always safe use ansitoutf8() (or UTF8Encode()) L on receved data. No, because ANSI is not UTF-* so any char outside of your ANSI codepage will be discarded or generate and exception, or whatever the developer decides (driver developer). L In Windows we get UTF-16, in Linux/UNIX we get UTF-8 If you use ANSI calls in linux you will receive UTF8 because in most linux the ANSI page is UTF8. If you call the *W APIs you MUST receive WideString or PWideCharArray or alike, or the API will not work. -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String
On Thu, 2011-01-13 at 09:15 +0100, LacaK wrote: Didn't I explain this to you and others a few times? ;-) If so, then please excuse me The database-components itself are encoding-agnostic. This means: encoding in = encoding out. So it is up to the developer what codepage he want to use. So TField.Text can have the encoding _you_ want. So, if you want to work with Lazarus, which uses UTF-8, you have to use UTF-8 encoded strings in your database. So this is answer, which i have looked for: In Lazarus TStringField MUST hold UTF-8 encoded strings. Not entirely true. You could also choose to bind the fields to some Lazarus-components manually, not using the db-components. (Tedit.Text := convertFunc(StringField.Text)) Or you can add a hook so that the .text property always does a conversion to UTF-8. First option can be used if you use a mediator or view. Second options I woudn't use. But I guess (I have theory), that in time, when Borland introduced TStringField, the design goal was: TStringField was designed for SBCS (because DataSize=Size+1) string data encoded in system ANSI code page and TWideStringField was designed for DBCS widestring (UTF-16) character data You have to be really careful in what you type, when you are writing about encodings. The above is nonsense, because of a very tiny mistake. If you compare DBCS widestring with UTF-16, you can also compare a stringfield with UTF-8. Exactly the same problem. (A character can be made up from more then one UTF-8 or UTF-16 codepoint) But TStringField's datasize by default is indeed Size+1. So if you use it t store UTF-8, you have to define the size as four times the field-size given by the database. Note that this is done in some cases. May be, that I was mistaken by this view. (or may be, that there is different approach in Delphi (no agnostic) and different in FPC (agnostic)?) No, Delphi does the same. Only newer Delphi versions have a string-type which contains the used encoding (details can be found in this thread), so can do some conversions for you. But that has nothing to do with the database-code. Also, you don't need it. People all over the world have used older Delphi versions all the time... (But offcourse, it's easier now) If there is some strange reason why you don't want the strings in your database to be UTF-8 encoded, SQL Server does not support UTF-8 (AFAIK) Rofl. You mean that Microsoft SQL Server can't handle unicode completely? If they say that in an advertisement they can forget that any big commercial client will choose their product... SQL Server provides non-UNICODE datatypes - char, varchar, text ie: TStringField and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext ie: TWideStringField. What does this have to do with your problem? Nothing. Only things what matters is what encoding is used while communicating with the client. (Which you can set) you have to convert the strings from the encoding your database uses to UTF-8 while reading data from the database. Luckily, you can specify the encoding of strings you want to use for most databases. Not only the encoding in which the strings are stored, but also the encoding which has to be used when you send and retrieve data from the database. And you can set this for each connection made. Ie: you can resolve the problem by changing the connection-string, or by adding some connection-parameter. Yes, it is true for example for MySQL or Firebird ODBC driver, but for SQL Server or PostgreSQL ODBC driver there are no such options Then that option has to be added. I think it's already possible but you simply don't know how. (Sql-Server is ODBC only, so that one is fixed. For firebird there's a 'serverencoding' parameter, or something like that. Postgres also has some setting. (but PostgreSQL ODBC driver exists in ANSI and UNICODE version) I saw that in an earlier message, but also this has nothing to do with your problem. You only need the different calls when you want to use UTF-8 in your fieldnames. (Or, and this one was tricky, in the connection-string. But this was more then a year ago.) SQL Server ODBC driver supports AutoTranslate, see: http://msdn.microsoft.com/en-us/library/ms130822.aspx SQL Server char, varchar, or text data sent to a client SQL_C_CHAR variable is converted from character to Unicode using the server ACP, then converted from Unicode to character using the client ACP. This is what you use when you set the encoding when you connect to the client. The solution to all your problems. As explained three times, in this message alone. In fact it's simple: incoming data=outgoing data. If you need UTF-8 encoding for the outgoing data (direct access to Lazarus controls) you have to select UTF-8 at the input. That's always more efficient than converting the data to/from any other encoding. And, luckily, you can instruct the Database-server which encoding
Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String
On Thu, 2011-01-13 at 09:49 +0100, LacaK wrote: So IMHO there must be: 1. allocated space in record buffer in size 4*TFieldDef.Size+1 2. redefine meaning of Size property (as number of bytes not characters) and create fielddefs with Size*4 Yes, those are the possible solutions. Good thing about the second option, is that a user can do that on his own if he wants to use UTF-8, just create persistent fields with a field size of 4*the amount of characters. I'm not sure if we have to change this. It's a problem the programmer has to deal with, I think... hm, according to http://docwiki.embarcadero.com/VCL/XE/en/DB.TStringField.Size is Size number of characters but according to http://docwiki.embarcadero.com/VCL/en/DB.TFieldDef.Size is Size number of bytes in underlaying database Yes, that's indeed the problem. But there's also the .DataSize property, so we could use that. Maybe... if the pressure on the bugtracker gets too high, I'll bow and change this. (I think 25% of all existing db bugs are related to this and people who do not understand anything about encodings.) but TField is created from TFieldDef and TField.Size=TFieldDef.Size ... so isn't it curious ? Not that when you want to use UTF-16 (or 32) you have to use TWideStringFields. So TWideStringField is no-encoding-agnostic field (is it designed to be everytime UTF-16 encoded) ? No. It's designed to contain an array of two-bytes records. In fact you can use it to store UCS-2 data, but not UTF-16. Same story as with ansi/UTF-8. Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String
On Thu, 2011-01-13 at 10:32 +0100, José Mejuto wrote: Hello FPC, Thursday, January 13, 2011, 10:03:02 AM, you wrote: ODBC 3.5 was launched around 2000-2001. L But this approach will require changes in packages/odbc/src/odbcsql.inc L like, does not ?: L -pointer(SQLGetData) := L GetProcedureAddress(ODBCLibraryHandle,'SQLGetData'); L +pointer(SQLGetData) := L GetProcedureAddress(ODBCLibraryHandle,'SQLGetDataW'); L And I do not know how it affect compatibility for example in UNIX or if L all ODBC drivers support this functionality. Most probably it needs a flag to indicate that the ODBC must work in Unicode, and then dynamic link to *W functions if this flag is set. I think ODBC drivers since 2002+/- should have this set of APIs, but I had never used ODBC in my life... :) This has only effect on the passed parameters to the ODBC-functions. ie: the field-names, table names and such. Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Dwar2 changed?
On Thu, 2011-01-13 at 11:00 +, Martin wrote: On 13/01/2011 07:45, Joost van der Sluis wrote: TCmdLineDebugger.SendCmdLn -data-evaluate-expression ^^shortstring(^POINTER($eax)^+12)^^ TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times TCmdLineDebugger.ReadLn (gdb) You do realize that this is an hack? (I partly wrote it) It could also be that the location of the exception-name has been changed by something. This hack doesn't use any debug-information. Only the definitions of a shortstring and pointer. I do, yes... But: - I know eax is correct, because the fall-back (using ^char instead of shortstring works: -data-evaluate-expression ^char(^pointer(^POINTER($eax)^+12)^+1) - The fallback is usually needed, if shortstring is not in the symboltable at all, but then the expression gives an error. Now the expression returns data, but the wrong data... Ehh? Does the fallback work or not? Strange, the same fpc on windows still works perfect with half a dozen different fpc versions (6.3 to 7.2) So it could be a gdb-problem? It is also possible that my previous fpc trunk on my fedora box was build with some debug ino, and now I forgot, and just build it... but again on windows I tested with fpc 2.4.2 and trunc, both build with several different configs... You lost me. Do I still have to do something? Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Dwar2 changed?
On 13/01/2011 14:14, Joost van der Sluis wrote: TCmdLineDebugger.SendCmdLn -data-evaluate-expression ^^shortstring(^POINTER($eax)^+12)^^ TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times TCmdLineDebugger.ReadLn (gdb) You do realize that this is an hack? (I partly wrote it) It could also be that the location of the exception-name has been changed by something. This hack doesn't use any debug-information. Only the definitions of a shortstring and pointer. I do, yes... But: - I know eax is correct, because the fall-back (using ^char instead of shortstring works: -data-evaluate-expression ^char(^pointer(^POINTER($eax)^+12)^+1) The fallback does work. So the data is in the correct location - The fallback is usually needed, if shortstring is not in the symboltable at all, but then the expression gives an error. Now the expression returns data, but the wrong data... Ehh? Does the fallback work or not? yes, but it was not triggered, because the first query did not return a gdbg-error, there just was no usable data. I added the data check, so now the fallback is triggered, and it works. Strange, the same fpc on windows still works perfect with half a dozen different fpc versions (6.3 to 7.2) So it could be a gdb-problem? possible, yes, strange though afaik shortstring is encoded as a record (len; chars), so maybe something in there... It is also possible that my previous fpc trunk on my fedora box was build with some debug ino, and now I forgot, and just build it... but again on windows I tested with fpc 2.4.2 and trunc, both build with several different configs... You lost me. Do I still have to do something? probably not, because there is not enough info yet to start on something. I was just asking if something obvious springs to mind I currently don't have the time to go through all the options and see what works and what not. e.g compile rtl with/whitout -gs / -gw or check older versions, ... ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Hello FPC, Thursday, January 13, 2011, 2:58:30 PM, you wrote: JvdS Then that option has to be added. I think it's already possible but you JvdS simply don't know how. (Sql-Server is ODBC only, so that one is fixed. JvdS For firebird there's a 'serverencoding' parameter, or something like JvdS that. Postgres also has some setting. Are you aware about the Firebird Field(UTF8) and SQLConnection CharSet(UTF8) problem ? Table X --- FieldTestName Varchar(5) UTF8 IBConnection --- CharSet = UTF8 //Source code file is UTF8 encoded. DataSource.FieldByName('FieldTestName').AsString='Ñ'; This raises an exception because the string is 10 bytes and the field only allow 5 chars. I think I had read this comment many months ago and answered as won'n fix for Interbase compatibility. Am I wrong ? -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String
On Thu, 2011-01-13 at 15:55 +0100, José Mejuto wrote: Hello FPC, Thursday, January 13, 2011, 2:58:30 PM, you wrote: JvdS Then that option has to be added. I think it's already possible but you JvdS simply don't know how. (Sql-Server is ODBC only, so that one is fixed. JvdS For firebird there's a 'serverencoding' parameter, or something like JvdS that. Postgres also has some setting. Are you aware about the Firebird Field(UTF8) and SQLConnection CharSet(UTF8) problem ? Table X --- FieldTestName Varchar(5) UTF8 IBConnection --- CharSet = UTF8 //Source code file is UTF8 encoded. DataSource.FieldByName('FieldTestName').AsString='Ñ'; This raises an exception because the string is 10 bytes and the field only allow 5 chars. I think I had read this comment many months ago and answered as won'n fix for Interbase compatibility. Am I wrong ? See my mail to Lacak about the two options to solve this. Joost. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re[4]: [fpc-devel] TStringField, String and UnicodeString and UTF8String
Hello FPC, Thursday, January 13, 2011, 4:24:31 PM, you wrote: Are you aware about the Firebird Field(UTF8) and SQLConnection CharSet(UTF8) problem ? JvdS See my mail to Lacak about the two options to solve this. I think the problem is different and can be solved without any compatibility problem or at least easily detectable. Anyway I'll write a possible patch and this way test if that's a viable solution and if it works send for evaluation. Thank you. -- Best regards, José ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8String
Marco van de Voort schrieb: In our previous episode, Hans-Peter Diettrich said: non-native strings, it can also be a performance win). IMO a single encoding, i.e. UTF-8, can cover all cases. Well, for starters, it doesn't cover the existing Delphi/unicode codebase. Because it's bound to UTF-16? That's not a problem, because WideString will continue to exist, and according conversions are still inserted by the compiler. That is DIY compatibility, or, in other words, no compaibility. I still don't understand the problem :-( Widestring will also grind the application to a halt due to being COM based on Windows. How that? When system encoding changes with the target platform, indexed access to such strings can lead to different results. Unless the compiler can read the coder's mind... You don't have to. The Delphi model provides a stringtype for the system encoding, and then as such all strings from the system can be labeled. With other stringtypes, the necessary conversions can be edited. Indexed string access produces other results for Ansi and UTF-8 system encoding. Such code is not portable, and the data (ini files) are not, too. Allowing for UTF-8 as the system encoding will frustrate Windows users (dunno whether Windows allows for such a system encoding), and Linux users are frustrated when UTF-8 is disallowed. Only solution: using OS encoding restricts the code to run on a single machine only, or on similarly configured machines. The group of users, which accept this restriction, will be happy with a single AnsiString type and no implicit conversions. Without implicit conversions such a string type can hold UTF-8 as well. Likewise, e.g. win32 console routines can be labeled with OEMString. (Since windows uses a different default encoding for the console) This either implies OEM encoding as the system encoding of Win32 console applications, or the use of multiple codepages, as before. But IMO Win32 console also implements a W interface, so that it's up to the user to use whatever is more appropriate for his code. The RTL has to distinguish between system-wide filesystem and GUI encoding, in file handling (CreateFile...). Why spend time in the design of multiple RTL/LCL versions, when a single version will be perfectly sufficient? Why spent 13 years being compatible when you can throw it away in a second? It's sufficient to throw away what's no more needed :-) The previous message from Jeff shows that even shortstring is still in major production use. Nothing is unused and can be clipped without a long winded transition, or Delphi 2009 like painful breaks. It's all about the well known dilemma: - force (possibly many) implicit conversions, or - supply multiple RTL/LCL versions, or - break legacy user code by moving to a different (but again unique) string type. Moreover, these discussions are useless since you know as well as I do that no one stringtype will ever satisfy everybody. So IMHO it is time to take the consequences from the 500 posts on this subject on the unicode subject on this and other FPC/Lazarus lists and start thinking in solutions to manage that, instead of reiterating the one type to rule them all mantra ad infinitum. The discussion is only about the pros and cons of the various possible solutions. I.e. it should reveal the critical cases and consequences, that have to be considered and handled in every implementation. The implementation can choose any model. Different models can be implemented as well, so that the final decision about the new standard can be delayed, until the models can be tested in real world applications. One model has already been implemented: UTF-8. It may need some adds/improvements, like a *hard* separation of AnsiString from UTF8String, and nothing has to be thrown away. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8String
On 12.01.2011 22:40, Marco van de Voort wrote: In our previous episode, Sven Barth said: legacy code can be broken by (eventually) required changes to set of char, sizeof(char) and PChar, sizeof(string) as opposed to Length(string), upper/lower conversion, and many more not so obvious consequences. I don't believe that PChar will be touched, because to much code that interfaces with C code depends on that. Although its declaration might not be the same then and become PChar = PAnsiChar instead of PChar = ^Char if Char is changed (currently its PAnsiChar = PChar). Current Delphi _does_ regard char as equivalent lowlevel type to string. So whatever you choose as string (8 or 16-bit), pchar will match it by changing to pansichar or pwidechar Oh come on -.- There are some days on which I really dislike the developers of Delphi... Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8String
On 13.01.2011 18:57, Hans-Peter Diettrich wrote: Widestring will also grind the application to a halt due to being COM based on Windows. How that? WideString on Windows has no reference counting, thus everytime a WideString is assigned it needs to be copied. When system encoding changes with the target platform, indexed access to such strings can lead to different results. Unless the compiler can read the coder's mind... You don't have to. The Delphi model provides a stringtype for the system encoding, and then as such all strings from the system can be labeled. With other stringtypes, the necessary conversions can be edited. Indexed string access produces other results for Ansi and UTF-8 system encoding. Such code is not portable, and the data (ini files) are not, too. Allowing for UTF-8 as the system encoding will frustrate Windows users (dunno whether Windows allows for such a system encoding), and Linux users are frustrated when UTF-8 is disallowed. Nearly all Windows API functions only allow single byte encodings or UTF-16. The only functions that I'm aware of, that can use UTF-8 encoding is the console input/output API (if the codepage is set to UTF-8) [and also file I/O APIs, but they don't assume any encoding]. Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Variables declaraction inside code
On 13.01.2011 02:20, Hans-Peter Diettrich wrote: LacaK schrieb: 3. C style comments: /* ... */ (I have never understood why in Pascal was used (* ... *) ) Pascal has several digraphs: (* = { *) = } (. = [ .) = ] I must say, that I have not yet known about the alternatives for the squared brackets. O.o One never stops learning... (and they are even documented :D ) Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8String
On 13-1-2011 21:40, Sven Barth wrote: WideString on Windows has no reference counting, thus everytime a WideString is assigned it needs to be copied. Not exactly true. widestring is com marshaled and thus has reference counting on the com level. afaik . As long as your memorymanager is com marshaled too, that is. And since most pascal memory manager versions do not support com directly, it goes wrong in a big way. I once wrote a simple com memory manager to test this. Performance stays sh*t, but strings seem to be counted, not copied. If you use coTaskMemAlloc, coTaskMemFree,CoTaskMemRealloc in your memory manager you will see what I mean. At least it comes close, but slow it will stay. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] String and UnicodeString and UTF8String
On Thursday, 13. January 2011 18.57:00 Hans-Peter Diettrich wrote: The implementation can choose any model. Different models can be implemented as well, so that the final decision about the new standard can be delayed, until the models can be tested in real world applications. One model has already been implemented: UTF-8. It may need some adds/improvements, like a *hard* separation of AnsiString from UTF8String, and nothing has to be thrown away. Another already implemented model is utf-16 UnicodeString in MSEgui. Needs no changes in Free Pascal compiler. Martin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel