Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Martin Schreiber
On Wednesday, 12. January 2011 23.05:02 Juha Manninen wrote:
 Martin Schreiber kirjoitti maanantai 10 tammikuu 2011 19:22:49:
  On Monday, 10. January 2011 16.27:19 Marco van de Voort wrote:
   And there are three such cases
  
   - normal FPC and Delph 2007- code :  ansistring(0)
   - Lazarus : ansistring=utf8
   - Delphi 2009+  UTF16.
 
  - fpGUI: ansistring = utf-8
  - MSEgui: existing FPC UnicodeString = utf-16

 Without studying your code myself I guess you had to make many utility
 functions and classes yourself for UTF-16 ?
 Even the normal TStringList doesn't work.

Correct. MSEgui has a complete development environment for UnicodeString with 
an own set of lists, streams, file and directory functions and the like.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread LacaK



Didn't I explain this to you and others a few times?
  

;-) If so, then please excuse me


The database-components itself are encoding-agnostic. This means:
encoding in = encoding out.

So it is up to the developer what codepage he want to use. So
TField.Text can have the encoding _you_ want.

So, if you want to work with Lazarus, which uses UTF-8, you have to use
UTF-8 encoded strings in your database. 
  

So this is answer, which i have looked for:
In Lazarus TStringField MUST hold UTF-8 encoded strings.

But I guess (I have theory), that in time, when Borland introduced 
TStringField, the design goal was:
TStringField was designed for SBCS (because DataSize=Size+1) string data 
encoded in system ANSI code page

and
TWideStringField was designed for DBCS widestring (UTF-16) character data

May be, that I was mistaken by this view.
(or may be, that there is different approach in Delphi (no agnostic) 
and different in FPC (agnostic)?)



If there is some strange reason why you don't want the strings in your
database to be UTF-8 encoded,

SQL Server does not support UTF-8 (AFAIK)
SQL Server provides non-UNICODE datatypes - char, varchar, text
and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext


 you have to convert the strings from the
encoding your database uses to UTF-8 while reading data from the
database.

Luckily, you can specify the encoding of strings you want to use for
most databases. Not only the encoding in which the strings are stored,
but also the encoding which has to be used when you send and retrieve
data from the database. And you can set this for each connection made.

Ie: you can resolve the problem by changing the connection-string, or by
adding some connection-parameter.

  

Yes, it is true for example for MySQL or Firebird ODBC driver,
but for SQL Server or PostgreSQL ODBC driver there are no such options
(but PostgreSQL ODBC driver exists in ANSI and UNICODE version)
SQL Server ODBC driver supports AutoTranslate, see: 
http://msdn.microsoft.com/en-us/library/ms130822.aspx
SQL Server *char*, *varchar*, or *text* data sent to a client 
SQL_C_CHAR variable is converted from character to Unicode using the 
server ACP, then converted from Unicode to character using the client ACP.

There's also another solution you can find on the forum and other
places. You can convert the strings to UTF-8 not only when they are read
from the database, but also when they are read from the internal memory.
There's a hook for that.

  

Thanks for your patience
-Laco.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread LacaK

Joost van der Sluis  wrote / nap?sal(a):

On Wed, 2011-01-12 at 14:59 +0100, LacaK wrote:

  

No. It is mandatory that you send/receive UTF8 to/from GUI LCL
elements. 
  
As LCL elements are using TStringField.Text property, then this property 
should return UTF8String, right (not AnsiString in ANSI code page) ?
If yes, then also TStringField must store internaly data in any unicode 
format (to not lose any characters), right ?
So it can be UTF-8, UTF-16 or UTF-32 ... in all cases we must allocate 
space 4*[max.number of characters in field], right ?

So in what encoding are string data stored now in TStringField ?



The encoding you've specified. In the connection-string or some other
database-server dependent setting.
  

ok.
But then there is problem in buffer size allocated for TStringField 
(ftString), does not ?

See please at bug report: http://bugs.freepascal.org/view.php?id=17376
There is described situation with SQLite (TSQLite3Connectin) , which 
returns UTF-8 strings, so there is no problem in encoding,
but problem is in fact, that for char(n),varchar(n) fields is created 
TStringField with Size=n and in record buffer is also allocated space 
with Size+1, where n is number of characters (not bytes). So truncation 
of data occurs, when writting UTF-8 encoded string into record buffer.

So IMHO there must be:
1. allocated space in record buffer in size 4*TFieldDef.Size+1 (and so on)
or
2. redefine meaning of Size property (as number of bytes not characters) 
and create fielddefs with Size*4
hm, according to 
http://docwiki.embarcadero.com/VCL/XE/en/DB.TStringField.Size is Size 
number of characters
but according to http://docwiki.embarcadero.com/VCL/en/DB.TFieldDef.Size 
is Size number of bytes in underlaying database


but TField is created from TFieldDef and TField.Size=TFieldDef.Size ... 
so isn't it curious ?

Not that when you want to use UTF-16 (or 32) you have to use
TWideStringFields.

  
So TWideStringField is no-encoding-agnostic field (is it designed to 
be everytime UTF-16 encoded) ?


-Laco.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread LacaK



L Yes in UNIX world it may be so (I do not know),
L but in Windows ODBC we have no such possibility AFAIK

Quote from Microsoft:
The ODBC 3.5 (or higher) Driver Manager supports both ANSI and
Unicode versions of all functions that accept pointers to character
strings or SQLPOINTER in their arguments. The Unicode functions are
implemented as functions (with a suffix of W), not as macros. The ANSI
functions (which can be called with or without a suffix of A) are
identical to the current ODBC API functions.

ODBC 3.5 was launched around 2000-2001.

  
But this approach will require changes in packages/odbc/src/odbcsql.inc 
like, does not ?:
-pointer(SQLGetData) := 
GetProcedureAddress(ODBCLibraryHandle,'SQLGetData');
+pointer(SQLGetData) := 
GetProcedureAddress(ODBCLibraryHandle,'SQLGetDataW');
And I do not know how it affect compatibility for example in UNIX or if 
all ODBC drivers support this functionality.


But also in this case we will get UTF-16 widestrings (in Windows) not 
UTF-8, does not ?


-Laco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread José Mejuto
Hello FPC,

Thursday, January 13, 2011, 10:03:02 AM, you wrote:

 ODBC 3.5 was launched around 2000-2001.
L But this approach will require changes in packages/odbc/src/odbcsql.inc
L like, does not ?:
L -pointer(SQLGetData) := 
L GetProcedureAddress(ODBCLibraryHandle,'SQLGetData');
L +pointer(SQLGetData) := 
L GetProcedureAddress(ODBCLibraryHandle,'SQLGetDataW');
L And I do not know how it affect compatibility for example in UNIX or if
L all ODBC drivers support this functionality.

Most probably it needs a flag to indicate that the ODBC must work in
Unicode, and then dynamic link to *W functions if this flag is set. I
think ODBC drivers since 2002+/- should have this set of APIs, but I
had never used ODBC in my life... :)

L But also in this case we will get UTF-16 widestrings (in Windows) not
L UTF-8, does not ?

That's not important, you get unicode in the specified by the API
format, then SQLConnector fills information in the expected target
format (WideString, UTF8String over AnsiString, Raw bytes...).

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Dwar2 changed?

2011-01-13 Thread Marc Weustink

Joost van der Sluis wrote:

On Wed, 2011-01-12 at 23:52 +, Martin wrote:

Has dwarf 2 changed ?



TCmdLineDebugger.SendCmdLn -data-evaluate-expression
^^shortstring(^POINTER($eax)^+12)^^
  TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times
  TCmdLineDebugger.ReadLn (gdb) 


You do realize that this is an hack? (I partly wrote it)


It looks much like I first wrote it :)

Anyway, without rtl debug info this is the only way to retrieve the 
classname of the exception object.



It could also be that the location of the exception-name has been
changed by something. This hack doesn't use any debug-information. Only
the definitions of a shortstring  and pointer.


I don't think that the exception name location is changed, it would mean 
that the VMT layout has changed.


Marc
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said:
  non-native strings, it can also be a performance win).
  IMO a single encoding, i.e. UTF-8, can cover all cases.
  
  Well, for starters, it doesn't cover the existing Delphi/unicode codebase.
 
 Because it's bound to UTF-16? That's not a problem, because WideString 
 will continue to exist, and according conversions are still inserted by 
 the compiler.

That is DIY compatibility, or, in other words, no compaibility. 

Widestring will also grind the application to a halt due to being COM based
on Windows.
 
  While some hard core Ansi coders may whine about such a convention, the
  absence of implicit string conversions (except in external library calls)
  will make such applications more performant than mixed-encoding versions.
  
  I don't see why this is the case. A current system encoding application does
  not do any conversion. (except for GUI output, and that can be considered
  negiable to the actual GUI overhead)
 
 When system encoding changes with the target platform, indexed access to 
 such strings can lead to different results. Unless the compiler can read 
 the coder's mind...

You don't have to. The Delphi model provides a stringtype for the system
encoding, and then as such all strings from the system can be labeled. With
other stringtypes, the necessary conversions can be edited.

Likewise, e.g. win32 console routines can be labeled with OEMString. (Since
windows uses a different default encoding for the console)
 
  Why spend time in the design of multiple RTL/LCL versions, when 
  a single version will be perfectly sufficient?
  
  Why spent 13 years being compatible when you can throw it away in a
  second?
 
 It's sufficient to throw away what's no more needed :-)

The previous message from Jeff shows that even shortstring is still in major
production use. Nothing is unused and can be clipped without a long winded
transition, or Delphi 2009 like painful breaks.

Moreover, these discussions are useless since you know as well as I do that
no one stringtype will ever satisfy everybody. So IMHO it is time to take
the consequences from the 500 posts on this subject on the unicode subject
on this and other FPC/Lazarus lists and start thinking in solutions to
manage that, instead of reiterating the one type to rule them all mantra
ad infinitum.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] += with properties

2011-01-13 Thread Michael Schnell

On 01/13/2011 02:25 AM, Hans-Peter Diettrich wrote:


This would result in the same error, because x.a is not an lval.


The example that make me ask here was Form1.Caption := Form1.Caption + '.';

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Dwar2 changed?

2011-01-13 Thread Martin

On 13/01/2011 07:45, Joost van der Sluis wrote:



TCmdLineDebugger.SendCmdLn -data-evaluate-expression
^^shortstring(^POINTER($eax)^+12)^^
  TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times
  TCmdLineDebugger.ReadLn (gdb) 

You do realize that this is an hack? (I partly wrote it)
It could also be that the location of the exception-name has been
changed by something. This hack doesn't use any debug-information. Only
the definitions of a shortstring  and pointer.


I do, yes...

But:
- I know eax is correct, because the fall-back (using ^char instead of 
shortstring works:

  -data-evaluate-expression ^char(^pointer(^POINTER($eax)^+12)^+1)
- The fallback is usually needed, if shortstring is not in the 
symboltable at all, but then the expression gives an error. Now the 
expression returns data, but the wrong data...


Strange, the same fpc on windows still works perfect with half a dozen 
different fpc versions (6.3 to 7.2)


It is also possible that my previous fpc trunk on my fedora box was 
build with some debug ino, and now I forgot, and just build it... but 
again on windows I tested with fpc 2.4.2 and trunc, both build with 
several different configs...


Martin

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread LacaK



Most probably it needs a flag to indicate that the ODBC must work in
Unicode, and then dynamic link to *W functions if this flag is set. I
think ODBC drivers since 2002+/- should have this set of APIs, but I
had never used ODBC in my life... :)
  
It seems, that Driver Manager automaticaly performs conversions on 
non-unicode driver. See 
http://web.datadirect.com/resources/odbc/unicode/odbc-driver.html
( and 
http://web.datadirect.com/resources/odbc/unicode/char-background.html )
If the driver is a non-Unicode driver, it cannot understand W function 
calls, and the Driver Manager must convert them to ANSI calls before 
sending them to the driver.


Also it seems to me, that when you call ANSI version of ODBC API 
functions, then you receive data in ANSI encoding.
If it is so, then it is always safe use ansitoutf8() (or UTF8Encode()) 
on receved data.



L But also in this case we will get UTF-16 widestrings (in Windows) not
L UTF-8, does not ?

That's not important, you get unicode in the specified by the API
format, then SQLConnector fills information in the expected target
format (WideString, UTF8String over AnsiString, Raw bytes...).

  

In Windows we get UTF-16, in Linux/UNIX we get UTF-8
So it is so, that in Windows is widestring=UTF-16 and in Linux/UNIX is 
widestring=UTF-8 string ?
(So there is different meaning of widestring type on different OSeses ? 
I am only Windows developer, so I have no understanding of others OSeses 
;-))


-Laco.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread José Mejuto
Hello FPC,

Thursday, January 13, 2011, 1:01:57 PM, you wrote:

L Also it seems to me, that when you call ANSI version of ODBC API
L functions, then you receive data in ANSI encoding.
L If it is so, then it is always safe use ansitoutf8() (or UTF8Encode())
L on receved data.

No, because ANSI is not UTF-* so any char outside of your ANSI
codepage will be discarded or generate and exception, or whatever the
developer decides (driver developer).

L In Windows we get UTF-16, in Linux/UNIX we get UTF-8

If you use ANSI calls in linux you will receive UTF8 because in most
linux the ANSI page is UTF8. If you call the *W APIs you MUST receive
WideString or PWideCharArray or alike, or the API will not work.

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread Joost van der Sluis
On Thu, 2011-01-13 at 09:15 +0100, LacaK wrote:
 
  Didn't I explain this to you and others a few times?

 ;-) If so, then please excuse me
 
  The database-components itself are encoding-agnostic. This means:
  encoding in = encoding out.
  
  So it is up to the developer what codepage he want to use. So
  TField.Text can have the encoding _you_ want.
  
  So, if you want to work with Lazarus, which uses UTF-8, you have to use
  UTF-8 encoded strings in your database. 

 So this is answer, which i have looked for:
 In Lazarus TStringField MUST hold UTF-8 encoded strings.

Not entirely true. You could also choose to bind the fields to some
Lazarus-components manually, not using the db-components. (Tedit.Text :=
convertFunc(StringField.Text)) Or you can add a hook so that the .text
property always does a conversion to UTF-8. First option can be used if
you use a mediator or view. Second options I woudn't use.

 But I guess (I have theory), that in time, when Borland introduced
 TStringField, the design goal was:
 TStringField was designed for SBCS (because DataSize=Size+1) string
 data encoded in system ANSI code page and TWideStringField was
 designed for DBCS widestring (UTF-16) character data

You have to be really careful in what you type, when you are writing
about encodings. The above is nonsense, because of a very tiny mistake.

If you compare DBCS widestring with UTF-16, you can also compare a
stringfield with UTF-8. Exactly the same problem. (A character can be
made up from more then one UTF-8 or UTF-16 codepoint)

But TStringField's datasize by default is indeed Size+1. So if you use
it t store UTF-8, you have to define the size as four times the
field-size given by the database. Note that this is done in some cases.

 May be, that I was mistaken by this view.
 (or may be, that there is different approach in Delphi (no agnostic)
 and different in FPC (agnostic)?)

No, Delphi does the same. Only newer Delphi versions have a string-type
which contains the used encoding (details can be found in this thread),
so can do some conversions for you. But that has nothing to do with the
database-code. Also, you don't need it. People all over the world have
used older Delphi versions all the time... (But offcourse, it's easier
now)

  If there is some strange reason why you don't want the strings in your
  database to be UTF-8 encoded,
 SQL Server does not support UTF-8 (AFAIK)

Rofl. You mean that Microsoft SQL Server can't handle unicode
completely? If they say that in an advertisement they can forget that
any big commercial client will choose their product...

 SQL Server provides non-UNICODE datatypes - char, varchar, text 

ie: TStringField

  and UNICODE (UCS-2) datatypes - nchar, nvarchar, ntext

ie: TWideStringField.

What does this have to do with your problem? Nothing. Only things what
matters is what encoding is used while communicating with the client.
(Which you can set)

  you have to convert the strings from the
  encoding your database uses to UTF-8 while reading data from the
  database.
  
  Luckily, you can specify the encoding of strings you want to use for
  most databases. Not only the encoding in which the strings are stored,
  but also the encoding which has to be used when you send and retrieve
  data from the database. And you can set this for each connection made.
  
  Ie: you can resolve the problem by changing the connection-string, or by
  adding some connection-parameter.
  

 Yes, it is true for example for MySQL or Firebird ODBC driver,
  but for SQL Server or PostgreSQL ODBC driver there are no such
 options

Then that option has to be added. I think it's already possible but you
simply don't know how. (Sql-Server is ODBC only, so that one is fixed.
For firebird there's a 'serverencoding' parameter, or something like
that. Postgres also has some setting.

  (but PostgreSQL ODBC driver exists in ANSI and UNICODE version)

I saw that in an earlier message, but also this has nothing to do with
your problem. You only need the different calls when you want to use
UTF-8 in your fieldnames. (Or, and this one was tricky, in the
connection-string. But this was more then a year ago.)

  SQL Server ODBC driver supports AutoTranslate, see:
 http://msdn.microsoft.com/en-us/library/ms130822.aspx
  SQL Server char, varchar, or text data sent to a client SQL_C_CHAR
 variable is converted from character to Unicode using the server ACP,
 then converted from Unicode to character using the client ACP.

This is what you use when you set the encoding when you connect to the
client. The solution to all your problems. As explained three times, in
this message alone.

In fact it's simple: incoming data=outgoing data.

If you need UTF-8 encoding for the outgoing data (direct access to
Lazarus controls) you have to select UTF-8 at the input. That's always
more efficient than converting the data to/from any other encoding.

And, luckily, you can instruct the Database-server which encoding 

Re: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread Joost van der Sluis
On Thu, 2011-01-13 at 09:49 +0100, LacaK wrote:

 So IMHO there must be:
 1. allocated space in record buffer in size 4*TFieldDef.Size+1 

 2. redefine meaning of Size property (as number of bytes not
 characters) and create fielddefs with Size*4

Yes, those are the possible solutions. Good thing about the second
option, is that a user can do that on his own if he wants to use UTF-8,
just create persistent fields with a field size of 4*the amount of
characters. I'm not sure if we have to change this. It's a problem the
programmer has to deal with, I think...

 hm, according to
 http://docwiki.embarcadero.com/VCL/XE/en/DB.TStringField.Size is Size
 number of characters
 but according to
 http://docwiki.embarcadero.com/VCL/en/DB.TFieldDef.Size is Size number
 of bytes in underlaying database

Yes, that's indeed the problem. But there's also the .DataSize property,
so we could use that.

Maybe... if the pressure on the bugtracker gets too high, I'll bow and
change this. (I think 25% of all existing db bugs are related to this
and people who do not understand anything about encodings.)

 but TField is created from TFieldDef and
 TField.Size=TFieldDef.Size ... so isn't it curious ?
  Not that when you want to use UTF-16 (or 32) you have to use
  TWideStringFields.
  

 So TWideStringField is no-encoding-agnostic field (is it designed to
 be everytime UTF-16 encoded) ?

No. It's designed to contain an array of two-bytes records. In fact you
can use it to store UCS-2 data, but not UTF-16. Same story as with
ansi/UTF-8.

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread Joost van der Sluis
On Thu, 2011-01-13 at 10:32 +0100, José Mejuto wrote:
 Hello FPC,
 
 Thursday, January 13, 2011, 10:03:02 AM, you wrote:
 
  ODBC 3.5 was launched around 2000-2001.
 L But this approach will require changes in packages/odbc/src/odbcsql.inc
 L like, does not ?:
 L -pointer(SQLGetData) := 
 L GetProcedureAddress(ODBCLibraryHandle,'SQLGetData');
 L +pointer(SQLGetData) := 
 L GetProcedureAddress(ODBCLibraryHandle,'SQLGetDataW');
 L And I do not know how it affect compatibility for example in UNIX or if
 L all ODBC drivers support this functionality.
 
 Most probably it needs a flag to indicate that the ODBC must work in
 Unicode, and then dynamic link to *W functions if this flag is set. I
 think ODBC drivers since 2002+/- should have this set of APIs, but I
 had never used ODBC in my life... :)

This has only effect on the passed parameters to the ODBC-functions. ie:
the field-names, table names and such. 

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Dwar2 changed?

2011-01-13 Thread Joost van der Sluis
On Thu, 2011-01-13 at 11:00 +, Martin wrote:
 On 13/01/2011 07:45, Joost van der Sluis wrote:
 
  TCmdLineDebugger.SendCmdLn -data-evaluate-expression
  ^^shortstring(^POINTER($eax)^+12)^^
TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times
TCmdLineDebugger.ReadLn (gdb) 
  You do realize that this is an hack? (I partly wrote it)
  It could also be that the location of the exception-name has been
  changed by something. This hack doesn't use any debug-information. Only
  the definitions of a shortstring  and pointer.
 
 I do, yes...
 
 But:
 - I know eax is correct, because the fall-back (using ^char instead of 
 shortstring works:
-data-evaluate-expression ^char(^pointer(^POINTER($eax)^+12)^+1)
 - The fallback is usually needed, if shortstring is not in the 
 symboltable at all, but then the expression gives an error. Now the 
 expression returns data, but the wrong data...

Ehh? Does the fallback work or not?

 Strange, the same fpc on windows still works perfect with half a dozen 
 different fpc versions (6.3 to 7.2)

So it could be a gdb-problem?

 It is also possible that my previous fpc trunk on my fedora box was 
 build with some debug ino, and now I forgot, and just build it... but 
 again on windows I tested with fpc 2.4.2 and trunc, both build with 
 several different configs...

You lost me. Do I still have to do something?

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Dwar2 changed?

2011-01-13 Thread Martin

On 13/01/2011 14:14, Joost van der Sluis wrote:

  TCmdLineDebugger.SendCmdLn -data-evaluate-expression
^^shortstring(^POINTER($eax)^+12)^^
   TCmdLineDebugger.ReadLn ^done,value=#0repeats 20 times
   TCmdLineDebugger.ReadLn (gdb) 

You do realize that this is an hack? (I partly wrote it)
It could also be that the location of the exception-name has been
changed by something. This hack doesn't use any debug-information. Only
the definitions of a shortstring  and pointer.

I do, yes...

But:
- I know eax is correct, because the fall-back (using ^char instead of
shortstring works:
-data-evaluate-expression ^char(^pointer(^POINTER($eax)^+12)^+1)

The fallback does work. So the data is in the correct location


- The fallback is usually needed, if shortstring is not in the
symboltable at all, but then the expression gives an error. Now the
expression returns data, but the wrong data...

Ehh? Does the fallback work or not?
yes, but it was not triggered, because the first query did not return a 
gdbg-error, there just was no usable data.

I added the data check, so now the fallback is triggered, and it works.


Strange, the same fpc on windows still works perfect with half a dozen
different fpc versions (6.3 to 7.2)

So it could be a gdb-problem?

possible, yes, strange though

afaik shortstring is encoded as a record (len; chars), so maybe 
something in there...






It is also possible that my previous fpc trunk on my fedora box was
build with some debug ino, and now I forgot, and just build it... but
again on windows I tested with fpc 2.4.2 and trunc, both build with
several different configs...

You lost me. Do I still have to do something?



probably not, because there is not enough info yet to start on something.

I was just asking if something obvious springs to mind


I currently don't have the time to go through all the options and see 
what works and what not.

e.g compile rtl with/whitout -gs / -gw
or check older versions, ...

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread José Mejuto
Hello FPC,

Thursday, January 13, 2011, 2:58:30 PM, you wrote:

JvdS Then that option has to be added. I think it's already possible but you
JvdS simply don't know how. (Sql-Server is ODBC only, so that one is fixed.
JvdS For firebird there's a 'serverencoding' parameter, or something like
JvdS that. Postgres also has some setting.

Are you aware about the Firebird Field(UTF8) and SQLConnection
CharSet(UTF8) problem ?

Table X
---
  FieldTestName Varchar(5) UTF8

IBConnection
---
  CharSet = UTF8

//Source code file is UTF8 encoded.
DataSource.FieldByName('FieldTestName').AsString='Ñ';

This raises an exception because the string is 10 bytes and the field
only allow 5 chars.

I think I had read this comment many months ago and answered as won'n
fix for Interbase compatibility. Am I wrong ?
  
-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: Re[2]: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread Joost van der Sluis
On Thu, 2011-01-13 at 15:55 +0100, José Mejuto wrote:
 Hello FPC,
 
 Thursday, January 13, 2011, 2:58:30 PM, you wrote:
 
 JvdS Then that option has to be added. I think it's already possible but you
 JvdS simply don't know how. (Sql-Server is ODBC only, so that one is fixed.
 JvdS For firebird there's a 'serverencoding' parameter, or something like
 JvdS that. Postgres also has some setting.
 
 Are you aware about the Firebird Field(UTF8) and SQLConnection
 CharSet(UTF8) problem ?
 
 Table X
 ---
   FieldTestName Varchar(5) UTF8
 
 IBConnection
 ---
   CharSet = UTF8
 
 //Source code file is UTF8 encoded.
 DataSource.FieldByName('FieldTestName').AsString='Ñ';
 
 This raises an exception because the string is 10 bytes and the field
 only allow 5 chars.
 
 I think I had read this comment many months ago and answered as won'n
 fix for Interbase compatibility. Am I wrong ?

See my mail to Lacak about the two options to solve this.

Joost.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[4]: [fpc-devel] TStringField, String and UnicodeString and UTF8String

2011-01-13 Thread José Mejuto
Hello FPC,

Thursday, January 13, 2011, 4:24:31 PM, you wrote:

 Are you aware about the Firebird Field(UTF8) and SQLConnection
 CharSet(UTF8) problem ?
JvdS See my mail to Lacak about the two options to solve this.

I think the problem is different and can be solved without any
compatibility problem or at least easily detectable.

Anyway I'll write a possible patch and this way test if that's a
viable solution and if it works send for evaluation.

Thank you.

-- 
Best regards,
 José

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:

In our previous episode, Hans-Peter Diettrich said:

non-native strings, it can also be a performance win).

IMO a single encoding, i.e. UTF-8, can cover all cases.

Well, for starters, it doesn't cover the existing Delphi/unicode codebase.
Because it's bound to UTF-16? That's not a problem, because WideString 
will continue to exist, and according conversions are still inserted by 
the compiler.


That is DIY compatibility, or, in other words, no compaibility.


I still don't understand the problem :-(


Widestring will also grind the application to a halt due to being COM based
on Windows.


How that?


When system encoding changes with the target platform, indexed access to 
such strings can lead to different results. Unless the compiler can read 
the coder's mind...


You don't have to. The Delphi model provides a stringtype for the system
encoding, and then as such all strings from the system can be labeled. With
other stringtypes, the necessary conversions can be edited.


Indexed string access produces other results for Ansi and UTF-8 system 
encoding. Such code is not portable, and the data (ini files) are not, 
too. Allowing for UTF-8 as the system encoding will frustrate Windows 
users (dunno whether Windows allows for such a system encoding), and 
Linux users are frustrated when UTF-8 is disallowed.


Only solution: using OS encoding restricts the code to run on a single 
machine only, or on similarly configured machines.


The group of users, which accept this restriction, will be happy with a 
single AnsiString type and no implicit conversions. Without implicit 
conversions such a string type can hold UTF-8 as well.




Likewise, e.g. win32 console routines can be labeled with OEMString. (Since
windows uses a different default encoding for the console)


This either implies OEM encoding as the system encoding of Win32 console 
applications, or the use of multiple codepages, as before. But IMO Win32 
console also implements a W interface, so that it's up to the user to 
use whatever is more appropriate for his code.


The RTL has to distinguish between system-wide filesystem and GUI 
encoding, in file handling (CreateFile...).



Why spend time in the design of multiple RTL/LCL versions, when 
a single version will be perfectly sufficient?

Why spent 13 years being compatible when you can throw it away in a
second?

It's sufficient to throw away what's no more needed :-)


The previous message from Jeff shows that even shortstring is still in major
production use. Nothing is unused and can be clipped without a long winded
transition, or Delphi 2009 like painful breaks.


It's all about the well known dilemma:
- force (possibly many) implicit conversions, or
- supply multiple RTL/LCL versions, or
- break legacy user code by moving to a different (but again unique) 
string type.



Moreover, these discussions are useless since you know as well as I do that
no one stringtype will ever satisfy everybody. So IMHO it is time to take
the consequences from the 500 posts on this subject on the unicode subject
on this and other FPC/Lazarus lists and start thinking in solutions to
manage that, instead of reiterating the one type to rule them all mantra
ad infinitum.


The discussion is only about the pros and cons of the various possible 
solutions. I.e. it should reveal the critical cases and consequences, 
that have to be considered and handled in every implementation.


The implementation can choose any model. Different models can be 
implemented as well, so that the final decision about the new standard 
can be delayed, until the models can be tested in real world applications.


One model has already been implemented: UTF-8. It may need some 
adds/improvements, like a *hard* separation of AnsiString from 
UTF8String, and nothing has to be thrown away.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Sven Barth

On 12.01.2011 22:40, Marco van de Voort wrote:

In our previous episode, Sven Barth said:

legacy code can be broken by (eventually) required changes to set of
char, sizeof(char) and PChar, sizeof(string) as opposed to
Length(string), upper/lower conversion, and many more not so obvious
consequences.


I don't believe that PChar will be touched, because to much code that
interfaces with C code depends on that. Although its declaration might
not be the same then and become PChar = PAnsiChar instead of PChar =
^Char if Char is changed (currently its PAnsiChar = PChar).


Current Delphi _does_ regard char as equivalent lowlevel type to string. So
whatever you choose as string (8 or 16-bit), pchar will match it by changing
to pansichar or pwidechar


Oh come on -.-

There are some days on which I really dislike the developers of Delphi...

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Sven Barth

On 13.01.2011 18:57, Hans-Peter Diettrich wrote:

Widestring will also grind the application to a halt due to being COM
based
on Windows.


How that?




WideString on Windows has no reference counting, thus everytime a 
WideString is assigned it needs to be copied.



When system encoding changes with the target platform, indexed access
to such strings can lead to different results. Unless the compiler
can read the coder's mind...


You don't have to. The Delphi model provides a stringtype for the system
encoding, and then as such all strings from the system can be labeled.
With
other stringtypes, the necessary conversions can be edited.


Indexed string access produces other results for Ansi and UTF-8 system
encoding. Such code is not portable, and the data (ini files) are not,
too. Allowing for UTF-8 as the system encoding will frustrate Windows
users (dunno whether Windows allows for such a system encoding), and
Linux users are frustrated when UTF-8 is disallowed.



Nearly all Windows API functions only allow single byte encodings or 
UTF-16. The only functions that I'm aware of, that can use UTF-8 
encoding is the console input/output API (if the codepage is set to 
UTF-8) [and also file I/O APIs, but they don't assume any encoding].


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Variables declaraction inside code

2011-01-13 Thread Sven Barth

On 13.01.2011 02:20, Hans-Peter Diettrich wrote:

LacaK schrieb:


3. C style comments: /* ... */ (I have never understood why in Pascal
was used (* ... *) )


Pascal has several digraphs:
(* = {
*) = }
(. = [
.) = ]


I must say, that I have not yet known about the alternatives for the 
squared brackets. O.o


One never stops learning...

(and they are even documented :D )

Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Thaddy

On 13-1-2011 21:40, Sven Barth wrote:


WideString on Windows has no reference counting, thus everytime a 
WideString is assigned it needs to be copied.
Not exactly true. widestring is com marshaled and thus has reference 
counting on the com level. afaik .
As long as your memorymanager is com marshaled too, that is. And since 
most pascal memory manager versions do not support com directly, it goes 
wrong in a big way.
I once wrote a simple com memory manager to test this. Performance stays 
sh*t, but strings seem to be counted, not copied.
If you use coTaskMemAlloc, coTaskMemFree,CoTaskMemRealloc  in your 
memory manager you will see what I mean.

At least it comes close, but slow it will stay.


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] String and UnicodeString and UTF8String

2011-01-13 Thread Martin Schreiber
On Thursday, 13. January 2011 18.57:00 Hans-Peter Diettrich wrote:

 The implementation can choose any model. Different models can be
 implemented as well, so that the final decision about the new standard
 can be delayed, until the models can be tested in real world applications.

 One model has already been implemented: UTF-8. It may need some
 adds/improvements, like a *hard* separation of AnsiString from
 UTF8String, and nothing has to be thrown away.

Another already implemented model is utf-16 UnicodeString in MSEgui. Needs no 
changes in Free Pascal compiler.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel