Re: [fpc-devel] TList or TFPList - a Linked list ?
Mattias Gaertner wrote: Plus some bytes for the memory manager for each mem block. Typically 8. FYI, for small blocks (512 bytes) it's only 4. And now I look at heap.inc again, I think that even alignment for small blocks could be upgraded to 4 byte granularity instead of 16 bytes. This would have a positive impact on memory usage (less slack), while retaining equal speed. Anyone interested in smaller granularity there ? It would increase the number of freelists by a factor 4, but that's globally, once-allocated memory. Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 21:33:34 +0100 Micha Nelissen [EMAIL PROTECTED] wrote: Hi, I've been thinking about adding a linked list implementation to either TList or TFPList. The basic problem to that is of course 1) space overhead of linked list is quite large 2) Index[..] will be O(N) For (1) I was thinking about making a linked list of an array of items, for example 14 pointers (so that 8 bytes are left for next pointer and memory manager needs on 32 bit platform). To solve (2), we can make the observation that generally people access lists in a linear fashion, and we might cache the previous and next list entry. This will break all random access uses. For example sorting a TList. The big advantage is getting rid of the many reallocs needed to grow the lists, because one usually doesn't set Capacity in advance, but keeps adding items until done. TFPList/TList grows exponentially, which is pretty good for a generic dynamic array. It results in O(n). Using aggregation possibly, TStringList must benefit from it too. What do you think about it ? Your trick will only give a constant factor on growing/shrinking the list memory, gives an extra O(n) factor for sorting a TList, the caching costs time, and the memory usage will also grow. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 21:44:50 +0100 (CET) Daniël Mantione [EMAIL PROTECTED] wrote: Op Wed, 14 Dec 2005, schreef Micha Nelissen: Sorry to disappoint, but this doesn't look a very good idea to me; it would kill code that for example tries to sort a list. There will be also a lot of code that iterates using index[]. Did you read the paragraph, to solve (2) ... ? Sorting is also quite a local operation, no ? No truly random-access is needed. Programmers need both list like datastructures and array like data structures. It is part of converting mathematical abstraction principles Array like datastructures are provided by dynamic arrays, I'd say ? like a sequence where every operation is O(1), to actual datastructures that are to be used inside a compiler. Programmers need both of them, not one or the other. Yes, currently there is no linked list at all, is there ? These issues are precisely the reason I'm writing to the list and gathering ideas first. Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005, Micha Nelissen wrote: Hi, I've been thinking about adding a linked list implementation to either TList or TFPList. The basic problem to that is of course 1) space overhead of linked list is quite large 2) Index[..] will be O(N) For (1) I was thinking about making a linked list of an array of items, for example 14 pointers (so that 8 bytes are left for next pointer and memory manager needs on 32 bit platform). To solve (2), we can make the observation that generally people access lists in a linear fashion, and we might cache the previous and next list entry. The big advantage is getting rid of the many reallocs needed to grow the lists, because one usually doesn't set Capacity in advance, but keeps adding items until done. I strongly disagree here. I use T(FP)List quite a lot, and usually do a good estimate in setting the capacity. When storing/reading items of a list you always write/read the count first... People that have large lists know this and take care of it. What is more, I think that 1 large memory block (an array) is much more efficient memory wise than many small blocks. Using aggregation possibly, TStringList must benefit from it too. No way, e.g. the IndexOf for sorted stringlists would be crippled totally. Inside TList, The Move()/Exchange operations would be a horror, and hence the listsort as well. What do you think about it ? I think it's a bad idea. TFPList is implemented for speed and does very well. But, on the bright side: A TLinkedList implementation is more than welcome, but it should be a separate class. LinkedLists and the regular List are 2 different beasts, that only have in common that they store a lot of pointers. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005, Micha Nelissen wrote: On Wed, 14 Dec 2005 21:44:50 +0100 (CET) Daniël Mantione [EMAIL PROTECTED] wrote: Op Wed, 14 Dec 2005, schreef Micha Nelissen: Sorry to disappoint, but this doesn't look a very good idea to me; it would kill code that for example tries to sort a list. There will be also a lot of code that iterates using index[]. Did you read the paragraph, to solve (2) ... ? Sorting is also quite a local operation, no ? No truly random-access is needed. QuickSort is not so local, definitely not in the beginning. Programmers need both list like datastructures and array like data structures. It is part of converting mathematical abstraction principles Array like datastructures are provided by dynamic arrays, I'd say ? like a sequence where every operation is O(1), to actual datastructures that are to be used inside a compiler. Programmers need both of them, not one or the other. Yes, currently there is no linked list at all, is there ? These issues are precisely the reason I'm writing to the list and gathering ideas first. Excellent, maybe now we'll finally have a decent TLinkedList :) An excellent location is the contnrs unit. It already contains the TStack and TQueue. TLinkedList perfectly fits in this list. Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Micha Nelissen wrote: Hi, I've been thinking about adding a linked list implementation to either TList or TFPList. The basic problem to that is of course 1) space overhead of linked list is quite large 2) Index[..] will be O(N) For (1) I was thinking about making a linked list of an array of items, for example 14 pointers (so that 8 bytes are left for next pointer and memory manager needs on 32 bit platform). To solve (2), we can make the observation that generally people access lists in a linear fashion, and we might cache the previous and next list entry. The big advantage is getting rid of the many reallocs needed to grow the lists, because one usually doesn't set Capacity in advance, but keeps adding items until done. Using aggregation possibly, TStringList must benefit from it too. What do you think about it ? Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc- My most recent project at work involves a good deal of non-linear access to TList items (sorry, not Free Pascal g), and this implementation would essentially kill the efficiency gains that we achieved by doing so. In addition, the caching overhead (the last requested index, as well as next prev pointers, and corresponding logic) would blow away the efficiency of iterating through TList items. There's no way to work around that. Linked lists are very specialized, and definitely have their place, and classes should be built for them. However, they're definitely their own beast, and should be treated as such. For instance, you made the valid point that they grow very easily and without the overhead of having to find large contiguous chunks of memory when the list grows, but iterating them is relatively slow. Programmers just need to recognize when this is an advantage and when it isn't. Just my $0.02. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Op Wed, 14 Dec 2005, schreef Micha Nelissen: On Wed, 14 Dec 2005 21:44:50 +0100 (CET) Daniël Mantione [EMAIL PROTECTED] wrote: Op Wed, 14 Dec 2005, schreef Micha Nelissen: Sorry to disappoint, but this doesn't look a very good idea to me; it would kill code that for example tries to sort a list. There will be also a lot of code that iterates using index[]. Did you read the paragraph, to solve (2) ... ? Sorting is also quite a local operation, no ? No truly random-access is needed. Yes I did, but caching the previous and the next is no solution for sorting. There exists no magic data structure that can replace both the list and the array. Programmers need both list like datastructures and array like data structures. It is part of converting mathematical abstraction principles Array like datastructures are provided by dynamic arrays, I'd say ? Yes, but a linked list also can be implemented using a record with a pointer; object oriented libraries a little bit extra power to the code, which a dynamic array cannot provide. like a sequence where every operation is O(1), to actual datastructures that are to be used inside a compiler. Programmers need both of them, not one or the other. Yes, currently there is no linked list at all, is there ? Afaik, no, we could use one, so please add it, and we're all happy :) These issues are precisely the reason I'm writing to the list and gathering ideas first. I know, no problem. Daniël___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Michael Van Canneyt wrote: People that have large lists know this and take care of it. What is more, I think that 1 large memory block (an array) is much more efficient memory wise than many small blocks. This is true in some cases; VMs and scripting engines (such as SpiderMonkey) use this technique. However, when the list size is underestimated and has to grow it can be a huge detriment to performance and memory efficiency. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 21:53:58 +0100 Mattias Gaertner [EMAIL PROTECTED] wrote: Your trick will only give a constant factor on growing/shrinking the list memory, gives an extra O(n) factor for sorting a TList, the caching costs time, and the memory usage will also grow. You may be underestimating the impact of the heap manager. I just checked the code in lists.inc. When there are less than 128 items, the list is increased by 8 items. So when adding 1000 (to take a number) items, the list is copied at least 10, possibly 13 times, 12 - 28 - 44 - 60 - 76 - 92 - 108 - 124 - 140 - 206 - 325 - ~500 - ~780 - ~1000. For the linked list case, no memory is copied. Sure, if you're going to sort a list, you should not use linked list of course, but an array (or tree maybe). Due to memory fragmentation, it is debatable whether memory usage will really grow significantly more for linked list. Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 22:40:40 +0100 Micha Nelissen [EMAIL PROTECTED] wrote: increased by 8 items. So when adding 1000 (to take a number) items, the list is copied at least 10, possibly 13 times, 12 - 28 - 44 - 60 - 76 - 92 - 108 - 124 - 140 - 206 - 325 - ~500 - ~780 - ~1000. For the linked list case, no memory is copied. Hmm, in case one doesn't know the number of items in advance, it would be nice to have a AssignTo/CopyTo(AList: TList) method that copies to items into a regular list, so it can be sorted or whatever random access you may need. Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 21:33:34 +0100 Micha Nelissen [EMAIL PROTECTED] wrote: For (1) I was thinking about making a linked list of an array of items, for example 14 pointers (so that 8 bytes are left for next pointer and memory manager needs on 32 bit platform). Tiny addition: for 64bit OS you would also use 14 pointers, but 16 bytes are left, which again is one for next pointer and memory manager's size: ptrint. Micha ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 22:40:40 +0100 Micha Nelissen [EMAIL PROTECTED] wrote: On Wed, 14 Dec 2005 21:53:58 +0100 Mattias Gaertner [EMAIL PROTECTED] wrote: Your trick will only give a constant factor on growing/shrinking the list memory, gives an extra O(n) factor for sorting a TList, the caching costs time, and the memory usage will also grow. You may be underestimating the impact of the heap manager. I just checked the code in lists.inc. When there are less than 128 items, the list is increased by 8 items. So when adding 1000 (to take a number) items, the list is copied at least 10, possibly 13 times, 12 - 28 - 44 - 60 - 76 - 92 - 108 - 124 - 140 - 206 - 325 - ~500 - ~780 - ~1000. For the linked list case, no memory is copied. It was not my decision to increase TList by only one fourth. But still the list grows exponentially, which means for 1000 items there will be only c times 1000 copies. For a growing of 25% as it is now c is less than 4. Sure, if you're going to sort a list, you should not use linked list of course, but an array (or tree maybe). Due to memory fragmentation, it is debatable whether memory usage will really grow significantly more for linked list. Indeed. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Mattias Gaertner wrote: Your trick will only give a constant factor on growing/shrinking the list memory, gives an extra O(n) factor for sorting a TList, the caching costs time, and the memory usage will also grow. Just saw this last statement. The memory usage is very comparable to TList, even with bi-directional (doubly-linked) lists, since TLists tend to grow by leaps. For example, assuming a linked list item comprises of only a "next" pointer, it requires 8 bytes of memory (4 for the structure itself, 4 for the next pointer). In this case, 10,000 entries occupy 80,008 bytes (80,000 + 4 for pointer to First + 4 for pointer to Last), distributed around the memory table. Also keep in mind that the data payload for the linked list item is usually contained within the structure itself. A TList (stripped down for this case) requires 4 bytes for the list allocation, plus 4 bytes per list entry. 10,000 entries occupy 80,004 bytes. Now, two things: 1. With automatically growing lists you have a very high likelihood of unused pointers, so while a linked list of 10,000 items is utilizing all 80,004 bytes of memory, the TList allocated (10,000-TList.Count)*4 unused bytes of memory. 2. The TList entry only points to the actual data payload, meaning another 4+n bytes must be allocated to store the data. This means an additional 40,000 bytes is required for a TList vs. a linked list. On the other hand, this is equalized in a doubly-linked list. Disclaimer: this is all based on the Delphi implementation of TList, and may differ slightly (but probably not much) for the FP lists. Sterling ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
RE: [fpc-devel] TList or TFPList - a Linked list ?
note that if the data you wan't to store in a tlist is the size of a pointer or less you can store it directly in the tlist. -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of Sterling BatesSent: 14 December 2005 22:04To: FPC developers' listSubject: Re: [fpc-devel] TList or TFPList - a Linked list ?Mattias Gaertner wrote: Your trick will only give a constant factor on growing/shrinking the list memory, gives an extra O(n) factor for sorting a TList, the caching costs time, and the memory usage will also grow. Just saw this last statement. The memory usage is very comparable to TList, even with bi-directional (doubly-linked) lists, since TLists tend to grow by leaps.For example, assuming a linked list item comprises of only a "next" pointer, it requires 8 bytes of memory (4 for the structure itself, 4 for the next pointer). In this case, 10,000 entries occupy 80,008 bytes (80,000 + 4 for pointer to First + 4 for pointer to Last), distributed around the memory table. Also keep in mind that the data payload for the linked list item is usually contained within the structure itself.A TList (stripped down for this case) requires 4 bytes for the list allocation, plus 4 bytes per list entry. 10,000 entries occupy 80,004 bytes. Now, two things:1. With automatically growing lists you have a very high likelihood of unused pointers, so while a linked list of 10,000 items is utilizing all 80,004 bytes of memory, the TList allocated (10,000-TList.Count)*4 unused bytes of memory.2. The TList entry only points to the actual data payload, meaning another 4+n bytes must be allocated to store the data. This means an additional 40,000 bytes is required for a TList vs. a linked list. On the other hand, this is equalized in a doubly-linked list.Disclaimer: this is all based on the Delphi implementation of TList, and may differ slightly (but probably not much) for the FP lists.Sterling ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 15:03:58 -0700 Sterling Bates [EMAIL PROTECTED] wrote: Mattias Gaertner wrote:Your trick will only give a constant factor on growing/shrinking the list memory, gives an extra O(n) factor for sorting a TList, the caching costs time, and the memory usage will also grow. Just saw this last statement. The memory usage is very comparable to TList, even with bi-directional (doubly-linked) lists, since TLists tend to grow by leaps. For example, assuming a linked list item comprises of only a next pointer, it requires 8 bytes of memory (4 for the structure itself, 4 for the next pointer). Plus some bytes for the memory manager for each mem block. Typically 8. In this case, 10,000 entries occupy 80,008 bytes (80,000 + 4 for pointer to First + 4 for pointer to Last), ~160,000 distributed around the memory table. Also keep in mind that the data payload for the linked list item is usually contained within the structure itself. A TList (stripped down for this case) requires 4 bytes for the list allocation, plus 4 bytes per list entry. 10,000 entries occupy 80,004 bytes. ~40,000 Now, two things: 1. With automatically growing lists you have a very high likelihood of unused pointers, so while a linked list of 10,000 items is utilizing all 80,004 bytes of memory, the TList allocated (10,000-TList.Count)*4 unused bytes of memory. i.e. plus a maximum of 25% with the current implementation ~50,000 2. The TList entry only points to the actual data payload, meaning another 4+n bytes must be allocated to store the data. This means an additional 40,000 bytes is required for a TList vs. a linked list. Huh? On the other hand, this is equalized in a doubly-linked list. Disclaimer: this is all based on the Delphi implementation of TList, and may differ slightly (but probably not much) for the FP lists. The main problem is the mem fragmentation. Here a growing TList can need up to 4 times its used memory. So in worst case it will need the memory of a single linked list. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Mattias Gaertner wrote: Plus some bytes for the memory manager for each mem block. Typically 8. Forgot about that, sorry. In any case, the math still works. I'll explain. In this case, 10,000 entries occupy 80,008 bytes (80,000 + 4 for pointer to First + 4 for pointer to Last), ~160,000 distributed around the memory table. Also keep in mind that the data payload for the linked list item is usually contained within the structure itself. A TList (stripped down for this case) requires 4 bytes for the list allocation, plus 4 bytes per list entry. 10,000 entries occupy 80,004 bytes. ~40,000 Actually, if you account for the allocation of each of the 10,000 items added to the TList, it is ~120,000 (4 + 8 for memory manager that you pointed out above) plus the 40,000, bringing the total to 160,000. My point above is that the linked list item itself typically houses the payload, so no additional pointer allocation (or memory manager record) is required. Now, two things: 1. With automatically growing lists you have a very high likelihood of unused pointers, so while a linked list of 10,000 items is utilizing all 80,004 bytes of memory, the TList allocated (10,000-TList.Count)*4 unused bytes of memory. i.e. plus a maximum of 25% with the current implementation ~50,000 2. The TList entry only points to the actual data payload, meaning another 4+n bytes must be allocated to store the data. This means an additional 40,000 bytes is required for a TList vs. a linked list. Huh? Here I'm pointing out that the item each entry in the TList refers to has to be allocated somewhere. Best case scenario, a record, means a minimum of 4 bytes for each allocation, plus 8 for the memory manager. It all adds up. The main problem is the mem fragmentation. Here a growing TList can need up to 4 times its used memory. So in worst case it will need the memory of a single linked list. Regarding fragmentation, it's my personal experience that allocation of large numbers (millions) of small data packets is easier to manage in a suitably unpredictable environment. In cases where I need TList functionality, I really have to estimate (in my case at run-time, which is very hit-and-miss) how large to make the TList. If I mispredict then I chew up large chunks of contiguous memory very very quickly. If I overpredict, which usually happens by very large amounts, I'm wasting memory. Granted that's not a huge issue on servers with four gigabytes of RAM, but on these high-traffic servers it could be. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Mattias Gaertner wrote: If the linked list item contains the whole data, then you are either not talking of the generic list this thread is about, or you use templates. In the later case a TList will also use templates and the 'payload' is the same. That's true of records, sure. Someone could also create a TLinkedListItem base class from which to descend and store their data there. The next prev pointers can be stored in the base class. Anyway, thanks for the discussion :) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
On Wed, 14 Dec 2005 16:22:53 -0700 Sterling Bates [EMAIL PROTECTED] wrote: Mattias Gaertner wrote: If the linked list item contains the whole data, then you are either not talking of the generic list this thread is about, or you use templates. In the later case a TList will also use templates and the 'payload' is the same. That's true of records, sure. Someone could also create a TLinkedListItem base class from which to descend and store their data there. The next prev pointers can be stored in the base class. Well, if you use objects, then you get even more mem overhead ... :) Anyway, thanks for the discussion :) With pleasure. We could discuss it endless. Micha's original idea was even to use a block linked list. :) Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] TList or TFPList - a Linked list ?
Mattias Gaertner wrote: Well, if you use objects, then you get even more mem overhead ... :) I was thinking about that too. Then, IIRC, I read that object overhead is a one-time allocation of class tables and such. Of course, you might have some extra overhead inherited from TObject (ClassName etc). In my case, though, I use TList for objects anyway, so it all balances out. Guess it really depends on how one uses the tools. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel