Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sat, 22 Nov 2008 23:05:43 +0200 listmember [EMAIL PROTECTED] wrote: Is there a way to determine how much memory is consumed by strings by a running application? I'd like to know this, in particular, for FPC ana Lazarus --to begin with. And, the reason I'd like to know this is this:

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 10:19, Mattias Gaertner wrote: On Sat, 22 Nov 2008 23:05:43 +0200 listmember[EMAIL PROTECTED] wrote: Is there a way to determine how much memory is consumed by strings by a running application? I'd like to know this, in particular, for FPC ana Lazarus --to begin with. And, the

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 10:31:39 +0200 listmember [EMAIL PROTECTED] wrote: [...] What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it would still be UTF-8 or whatever. I am only considering in memory representation being UTF-32 (or UCS-4). What do you mean with 'memory

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
I am only considering in memory representation being UTF-32 (or UCS-4). What do you mean with 'memory representation'? That, each char in a string in memory would be 4-bytes (or more); yet, when saved on disk (or transmitted across the net etc.) it would be UTF-8 compressed. IOW, no

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
Actually, load times are not --does not seem to be-- linear at all. 4 times larger file seems to take only twice as long. I did one very simple test using 2 text files: File 1: 384 MB (403,248,710 bytes) File 2: 120 MB (126,680,448 bytes) with the code below: procedure

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Sergei Gorelkin
Graeme Geldenhuys wrote: On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner [EMAIL PROTECTED] wrote: On Sat, 22 Nov 2008 23:05:43 +0200 For example the lazarus IDE typically holds 50 to 200mb sources in memory. If this would be changed to unicodestring (2 byte per char) then the IDE would need

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 11:09:25 +0200 listmember [EMAIL PROTECTED] wrote: I am only considering in memory representation being UTF-32 (or UCS-4). What do you mean with 'memory representation'? That, each char in a string in memory would be 4-bytes (or more); yet, when saved on disk (or

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 13:07, Graeme Geldenhuys wrote: On Sun, Nov 23, 2008 at 12:29 PM, listmember[EMAIL PROTECTED] wrote: What I am curious about is: 4 times of what? RAM, Ramdom Access Memory, DIMMs those little green sticks you shove into the motherboard. :-) :)

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys
On Sun, Nov 23, 2008 at 1:05 PM, listmember [EMAIL PROTECTED] wrote: I just checked (using Process Explorer, under Windows) and this is what I see: Working set: 2,216 K Peak Working set: 26,988 K I can't see where that 50 MB fits into that. Well it all depends on how many files you have

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Sergei Gorelkin
listmember wrote: This is my thick-day. So, permit me to ask this: Are you really saying that strings occupy 50 MB Lazarus's memory footprint? I just checked (using Process Explorer, under Windows) and this is what I see: Working set: 2,216 K Peak Working set: 26,988 K I can't see where

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys
On Sun, Nov 23, 2008 at 12:29 PM, listmember [EMAIL PROTECTED] wrote: What I am curious about is: 4 times of what? RAM, Ramdom Access Memory, DIMMs those little green sticks you shove into the motherboard. :-) Regards, - Graeme - ___ fpGUI

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys
On Sun, Nov 23, 2008 at 1:13 PM, Graeme Geldenhuys [EMAIL PROTECTED] wrote: I can't see where that 50 MB fits into that. Well it all depends on how many files you have open, project size etc... As an example. Using a small project, Lazarus sits at 26MB or memory. I then open the MacOSAll.pas

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 13:05:15 +0200 listmember [EMAIL PROTECTED] wrote: On 2008-11-23 12:50, Jonas Maebe wrote: On 23 Nov 2008, at 11:29, listmember wrote: It is not hard to tell that an app that works with text files (such as Lazarus) will consume 4 times more memory per file loaded.

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
However, you may hack into RTL at the NewAnsiString / NewWideString / NewUnicodeString procedures and install hooks that will record the number of bytes requested. That shouldn't be too difficult to do. This is what I was looking for. Thank you. ___

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
Do a 'find declaration' on an identifier, that does not exist. This will explore all units of the uses section. Now I see what you mean. But, isn't this a design-choice; caching all sources in memory for speed reasons, as opposed to on-demand opening and closing each file. Still. If that is

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione
Op Sun, 23 Nov 2008, schreef listmember: What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it would still be UTF-8 or whatever. I am only considering in memory representation being UTF-32 (or UCS-4). This way, loading from and saving to would hardly be affected, yet

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 13:49, Jonas Maebe wrote: On 23 Nov 2008, at 12:35, listmember wrote: But, isn't this a design-choice; caching all sources in memory for speed reasons, as opposed to on-demand opening and closing each file. For very large projects, that should probably be done anyway at some

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 14:10, Daniël Mantione wrote: Therefore, any other encoding is a waste of memory and does not gain you any speed. For that reason, I don't see the compiler switch from 8-bit processing either. I nearly fully agree with you. Except that, when a string constant needs to contain

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione
Op Sun, 23 Nov 2008, schreef listmember: On 2008-11-23 14:10, Daniël Mantione wrote: Therefore, any other encoding is a waste of memory and does not gain you any speed. For that reason, I don't see the compiler switch from 8-bit processing either. I nearly fully agree with you. Except

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Jonas Maebe
On 23 Nov 2008, at 13:31, Daniël Mantione wrote: For an IDE, this is a little bit more complicated. I.e. searching for a ç in a source file needs to find both the composed and the decomposed variant, and in the case of UTF-8, this character can be encoded in 1, 2, 3 or 4 bytes which all

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 12:37:32 +0100 Martin Schreiber [EMAIL PROTECTED] wrote: On Sunday 23 November 2008 09.26:35 Graeme Geldenhuys wrote: On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner [EMAIL PROTECTED] wrote: On Sat, 22 Nov 2008 23:05:43 +0200 For example the lazarus IDE

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione
Op Sun, 23 Nov 2008, schreef Jonas Maebe: On 23 Nov 2008, at 13:31, Daniël Mantione wrote: For an IDE, this is a little bit more complicated. I.e. searching for a ç in a source file needs to find both the composed and the decomposed variant, and in the case of UTF-8, this character can be

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 14:11:50 +0200 listmember [EMAIL PROTECTED] wrote: [...] For very large projects, that should probably be done anyway at some point. But even in that case, using a more memory-efficient string type enables you to keep more data in memory and hence potentially obtain

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Martin Schreiber
On Sunday 23 November 2008 09.26:35 Graeme Geldenhuys wrote: On Sun, Nov 23, 2008 at 10:19 AM, Mattias Gaertner [EMAIL PROTECTED] wrote: On Sat, 22 Nov 2008 23:05:43 +0200 For example the lazarus IDE typically holds 50 to 200mb sources in memory. If this would be changed to unicodestring

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
I thought my example described just that. If strings use 4 bytes per char then ASCII text will need 4 times more memory. I am not disputing that. What I am curious about is: 4 times of what? It is not hard to tell that an app that works with text files (such as Lazarus) will consume 4 times

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 12:50, Jonas Maebe wrote: On 23 Nov 2008, at 11:29, listmember wrote: It is not hard to tell that an app that works with text files (such as Lazarus) will consume 4 times more memory per file loaded. But, how much memory does, say, Lazarus --itself-- consume specifically for

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 13:35:07 +0200 listmember [EMAIL PROTECTED] wrote: Do a 'find declaration' on an identifier, that does not exist. This will explore all units of the uses section. Now I see what you mean. But, isn't this a design-choice; caching all sources in memory for speed

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Mattias Gaertner
On Sun, 23 Nov 2008 13:49:32 +0100 (CET) Daniël Mantione [EMAIL PROTECTED] wrote: Op Sun, 23 Nov 2008, schreef Jonas Maebe: On 23 Nov 2008, at 13:31, Daniël Mantione wrote: For an IDE, this is a little bit more complicated. I.e. searching for a ç in a source file needs to find

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Sergei Gorelkin
Daniël Mantione wrote: Instead UTF-8, you need to make sure the string has enough characters left, and then compare multiple characters. Heck, you even need to take care of the fact the the combining cedille can be encoded in 2, 3 or 4 bytes. In this example it may be more efficient to

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Jonas Maebe
On 23 Nov 2008, at 12:35, listmember wrote: Do a 'find declaration' on an identifier, that does not exist. This will explore all units of the uses section. Now I see what you mean. But, isn't this a design-choice; caching all sources in memory for speed reasons, as opposed to on-demand

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort
In our previous episode, listmember said: Is there a way to determine how much memory is consumed by strings by a running application? Maybe you can keep a counter in the routines of astrings. Increase/adjust on newansistring or setlength. I'd like to know this, in particular, for FPC ana

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort
In our previous episode, listmember said: The last time I joined a relevant discussion, I was told worrying about native UCS-4 string-type would be pointless simply because that sort of thing is really needed for word processors only. Now, I have been informed that Lazarus (and perhaps

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 14:34, Mattias Gaertner wrote: On Sun, 23 Nov 2008 14:11:50 +0200 listmember[EMAIL PROTECTED] wrote: That leaves me wondering how much do we lose performance-wise in endlessly decompressing UTF-8 data, instead of using, say, UCS-4 strings. I'm wondering what you mean with

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 14:19, Mattias Gaertner wrote: On Sun, 23 Nov 2008 13:35:07 +0200 listmember[EMAIL PROTECTED] wrote: [...] These dependencies are complex and require exclusive access. The memory belongs to the program, the source files can be changed by anyone. Therefore the files are kept in

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 14:49, Daniël Mantione wrote: Op Sun, 23 Nov 2008, schreef Jonas Maebe: On 23 Nov 2008, at 13:31, Daniël Mantione wrote: For an IDE, this is a little bit more complicated. I.e. searching for a ç in a source file needs to find both the composed and the decomposed variant, and

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Martin Schreiber
On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote: But RTTI only contains published classes, does it not? AFAIK there are some more elements where is is possible to get a typeinfo pointer. A compiler specialist can say more. :-) Does MSEGui read ppu files? No. Martin

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort
In our previous episode, Martin Schreiber said: [ Charset ISO-8859-1 unsupported, converting... ] On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote: But RTTI only contains published classes, does it not? AFAIK there are some more elements where is is possible to get a typeinfo

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 15:10, Marco van de Voort wrote: In our previous episode, listmember said: [].. I'd like to know this, in particular, for FPC ana Lazarus --to begin with. And, the reason I'd like to know this is this: Whenever I suggest that char size be increased to 4, the idea gets opposed

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Daniël Mantione
Op Sun, 23 Nov 2008, schreef Marco van de Voort: In our previous episode, Martin Schreiber said: [ Charset ISO-8859-1 unsupported, converting... ] On Sunday 23 November 2008 13.44:02 Mattias Gaertner wrote: But RTTI only contains published classes, does it not? AFAIK there are some more

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said: AFAIK there are some more elements where is is possible to get a typeinfo pointer. A compiler specialist can say more. :-) Well, I'm not an expert, but I can only think of enumerations. These have RTTI under Delphi because they are shown in

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread Graeme Geldenhuys
On Sun, Nov 23, 2008 at 3:45 PM, listmember [EMAIL PROTECTED] wrote: I am referring to going to the nth character in a string. With UTF-8 it is no more a simple arithmetic and an index operation. You have to start from zero and iterate until you get to your characters --at every step,

Re: [fpc-devel] Memory consumed by strings

2008-11-23 Thread listmember
On 2008-11-23 19:31, Graeme Geldenhuys wrote: At least the good thing of UTF-8 is that you don't have to worry about LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue. LE/BE only applies when streaming to/from file/device/network, otherwise life is much simpler with UTF-32.