Hello all,

We recently switched from Xerces-C2.2 to Xerces-C2.3 for some of our products'
XML parsing and are having problems with the memory management features.

When running a COM object that uses Xerces-C2.3 (DOM or SAX - we use them both
in different objects with the same results) on a IIS server with five clients
interrogating the objects, each of the objects starts failing after a few hours
with problems related to the XML parsing. Rolling back to 2.2 helped: there
were no more crashes.

I've been looking into the memory manager's code in the 2.3 codebase and 
current CVS and have looked through the discussions surrounding the memory 
manager and found some interesting remarks back in may.

The XMemory class uses a global memory manager class thet defaults to 
MemoryManagerImpl. This latter is in charge of allocating and freeing memory
and will return a pointer to that memory to XMemory. XMemory does some 
management of its own, however, "aligning" the pointer it returns by 
pre-pending a "header" containing a pointer to the memory manager - which is
always the same because it is a global and not as configurable as one might
want to believe.

A few things bother me in the current setup, first among which is this 
alignment. The memory is aligned to sizeof(void*) or sizeof(double), whichever
is greater. On MSVC (the compiler I'm using at the moment) this results in 
an alignment to 8 bytes.

Looking through the code of the debug versions of malloc and realloc in MSVC,
it looks like this couple aligns to 16 bytes - the size of a "paragraph". We
will be testing with a 16-byte alignment shortly.

Another thing that bothers me is that we have no choice wether or not we want
to use this setup: I would personally have preferred a setup in which the
memory manager were stored in a map referenced by the block, or something
similar (i.e. keep the housekeeping outside of the block). This doesn't only
guard you against corruption in case of buffer over- or underrun (which, as
suggested in the discussion in may, is not something one should be too 
concerned about) but also gets rid of any alignment problem that might come up
because the hypothesis that 
  (sizeof(void*) > sizeof(double) ? sizeof(void*) : sizeof(double)) 
is always the right alignment. Of course, you'd have to deal with the 
hypothesis that pointers can be compared amongst eachother, but that particular
hypothesis is made already in the Xerces codebase (most notably in DOMCasts
which includes some pointer arithmatic of which the semantics are undefined).
There are various non-blocking thread-safe algorithms Out There that would 
allow one to map against a pointer in a fast and thread-safe manner.

Another thing that bothers me in this setup is the fact that one might be led
to believe that running Initialize from a thread with a different memory 
manager will actually change the memory manager per thread. The reason for 
this is that the API documentation at 
  http://xml.apache.org/xerces-c/apiDocs/classXMLPlatformUtils.html#z553_0
makes no mention of the memory manager, but the source code changes the 
memory manager *before* looking whether it has already been invoked. That the
memory manager being changed is global and will therefore be changed from under
the very nose of all the other threads is bothering to me.

There is of course a little thing called thread-local data storage that could
have been used in this case, but that wouldn't stick with the portability 
goal of the project, so I'd propose to move the assignment of the memory
manager to after checking whether initialization has already occured.

Note that if you don't do this, you have a race condition in XMemory, as it
reads the pointer to the memory manager twice: once to call it and once to
put it in the buffer. Between those two reads, the memory manager can be
changed.

void* XMemory::operator new(size_t size)
{
        size_t headerSize = XMLPlatformUtils::alignPointerForNewBlockAllocation(
                                                                                
sizeof(MemoryManager*));

// READ ONE     
    void* const block = XMLPlatformUtils::fgMemoryManager->allocate
        (
                headerSize + size
        );
// READ TWO
    *(MemoryManager**)block = XMLPlatformUtils::fgMemoryManager;

    return (char*)block + headerSize;
}

Of course, the documentation says it's a per-process initialisation, but I
don't like inviting disaster.

My questions:
* will patches aiming to 
  a. make XMemory optional at compile-time
  b. make XMemory not try to align the memory and always use the same memory 
     manager
  c. make XMemory optionally (compile-time option) align or keep a separate
     managers table
  ... be accepted or refused beforehand (pick one of the three goals, please;
      I personally prefer either b or c)
  Rationale for b: the memory manager is per-process and should be dealt with 
  as such. Once it is installed with the first Initialize (which needs a patch
  to make it so) it won't be changed. The pointer to it will henceforth only
  be used read-only.
  Rationale for c: if the memory manager is to be optional on a per-allocation 
  basis, the alignment may severely screw up assumptions by the compiler or 
  restrictions imposed by the architecture. In that case, a table that maps 
  between pointers and their managers is much safer than aligning anyway
  Rationale for a: if b and c are unacceptable, at least leave the user a 
  choice..

IMHO, pluggable memory management is a Good Thing, but the current 
implementation leaves to be desired and is currently suspected of being the
cause of our crashes. It needs work.

If there is a resounding "yes" to one of the three options from this list, I
can talk with management to allocate time for the development. If not, we'll
probably fork off an in-house XML parser.

rlc

-- 
Having the fewest wants, I am nearest to the gods.
                -- Socrates


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to