On 13 Sep 2007, Daniel Corbe stated: > Why would I run into heap corruption issues unless there's something > blatantly wrong with xmlDocDump. > > My understanding of HEAP vs STACK memory is that local variables come from > the stack and global variables along with anything malloc() comes off of the > heap.
This is correct from the point of view of the C abstract machine (except that it doesn't call the stack anything specifically; it's just automatic allocation). It's incorrect from the point of view of what the machine actually does, where (on a Linux box and many/most other ELF systems): - stack variables come off the stack. This is generally strictly limited in size, and extremely limited on 32-bit threaded environments because of the need to fit all the stacks for all the threads into the address space. - initialized static variables (both global and local) come from the .data section of the executable, which is privately mmap()ped and modifiable. This is limited in size only by available memory and address space. - uninitialized static variables (both global and local) come from the .bss section, which is allocated by the dynamic linker and filled in with zeroes. This too is limited in size only by available memory and address space. - heap allocations are satisfied on nearly all Unixes from an arena maintained by the C library, raised and shrunk on demand via the brk() system call. Because this is a single contiguous arena, it can suffer from fragmentation and overruns. Most C libraries store housekeeping data for a block before the start of that block, so underrunning a block can corrupt the arena and crash programs on later malloc() or free() calls. This is theoretically limited only by available memory and address space, but alignment constraints, housekeeping data, and especially fragmentation can reduce its effective size substantially as a program runs. On Linux/ glibc and some other systems, large malloc() allocations are satisfied via mmap() directly from the OS, mostly to reduce heap fragmentation. (The definition of `large' is changeable by the application and on modern glibc versions varies dynamically). Overruns in these areas might cause segfaults but will not corrupt other state or cause later crashes in other calls. (Windows's memory allocation models are profoundly different and the last time I had to deal with them was in the Windows 3.1 days, so anything I could say would be more misleading than useful. If anyone else wants to describe the Windows model, feel free. > Heap memory is essentially limited only by the amount of physical RAM > and virtual memory in your machine With modern RAM volumes, address space is a more serious constraint on many applications. I doubt that Windows apps can allocate anything like as much as 4Gb on a 32-bit platform. > whereas your call stack is generally > limited to 1Mbit per thread by default on Windows. The amount on Linux 32-bit platforms has varied with time and is customizable; the default is generally somewhere betwen four and eight megabytes, IIRC. > The default stack size in most Linux distributions is unlimited or some very > high number stack size (kbytes, -s) 8192 > so I could easily see how I may have missed a stack issue. > Issues with the heap tend to be more visible (in the form of crashes) and > obvious (dereferencing null or uninitialized pointers, reading/writing > out-of-bounds, etc) If this app runs on Linux too, you might want to try to valgrind it and see if that spots anything. (valgrind is *very* good at detecting overruns on the heap, although less good at detecting stack problems. GCC 4's -fmudflap option might also be useful. -- `Some people don't think performance issues are "real bugs", and I think such people shouldn't be allowed to program.' --- Linus Torvalds _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
