Re: cloning DOMString's on assignment in Xerces-C

David_N_Bertoni 19 Nov 1999 17:29:55 -0000

This code _does_modify the node in the DOM tree, at least in the versions
that I've used (2.3, and 3.0)  -- I tested this before I posted.  Here's
the code from the repository at xml.apache.org:


NodeImpl.cpp

DOMString NodeImpl::getNodeName() {
    return name;
};


Of course, if you've changed this behavior recently, then I apologize for
not being up-to-date on things.

In general, I just think it's bad design to have a C++ class that seems to
be a "string" class that acts like a Java class.  DOMString has C++
pointer/reference semantics, not value semantics.  We've been bitten by
this in our code, and I'm sure other people will as well.

At any rate, it _is_ possible to have a string class that uses reference
counting, but has value semantics.  That, I think, is how a C++ string
class should behave.  And I really don't understand why you think
DOMStrings have to behave like the DOM_Node smart-pointers.  My ideal world
would be a DOM implementation without reference counting, and a value-style
string class with reference counting.

And since you mention StringHandle, and it's custom operator new and
delete, I'll point out what I think is a serious flaw in the implementation
-- I can't find anywhere where blocks of StringHandles are deleted.  Yes,
they are returned to a free list, but that free list _never_ shrinks.  This
means that the amount of memory consumed by StringHandles can only
increase, and will only be freed when the process terminates.  That's not
good for a 24x7 server environment, not to mention the extra difficulty it
creates for detecting memory leaks using automated tools.

Dave Bertoni





[EMAIL PROTECTED] on 11/18/99 09:22:36 PM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:    (bcc: David N Bertoni/CAM/Lotus)

Subject:  Re: cloning DOMString's on assignment in Xerces-C






Curt Arnold wrote

>I've noticed in some places string assignments first clone the right hand
>side and in other places they don't.  I don't know if this is intentional,
>an artifact of the derivation from XML4J, or just an unnecessary bit of
>overhead.  If someone could enlighten me, I'd appreciate it.

> For example:  [from the constructors and clone operator for nodes]

Here is what is happening:

  DOMStrings can be modified.

  If a string field of a node is being assigned with a string of
  unknown origin, cloning the string first prevents subsequent
  changes in the original string from being seen in the cloned
  copy.

  If some field of a node is being set to a string that known to
  never change, it can simply be assigned.  Sources that are a
  constant field of another node are safe to use uncloned.

  Cloning is fairly efficient; only a new string handle object
  needs to be created  The data itself will be shared (until such time
  as one or the other of the strings is modified.)  And string handles
  have a custom operator new(), in the interest of speed.



Dave Bertoni wrote

> The meta issue here is that I think this is a very confusing
implementation
> for C++ programmers -- most would think that a DOMString is more like the
> typical C++ string class (std::string, for example), rather than a Java
> StringBuffer.  I predict this will be a frequent source of
> very-hard-to-find bugs.  Imagine geting a value from a node and
> inadvertantly modifying that string:
>
>     DOMString  theNodeName = node.getNodeName();
>
>     theNodeName += "foo";
>
> I've now changed the name of the node.

No.  The node.getNodeName() returns a clone of the string. In general,
a node's properties will not be affected by any changes to strings
returned from any getter function.

In the time that this code has been out as XML4C there have
indeed been questions about how strings and string cloning work.
But there really haven't been all that many of them, and there
have been no hard-to-track-down problems from incorrect string
usage that have made it back to me.  (I've been lucky - sending
this note is almost guaranteed to land one in my lap tomorrow
morning.)


I really would have preferred to use some preexisting string
package rather than invent yet another new one.  If we can
find something standard that does the job, I wouldn't mind
losing DOMString, although there would no doubt be a squawk
from the existing user base.

Here were my requirements at the time we went with DOMString:

    No std library dependencies.  There are still important platforms
      (some from IBM, unfortunately) that do not support it.

    16 bit chars, not wchar_t, which is sometimes 16, sometimes
    32, depending on the platform.  (DOM memory usage is
    already a big sore point)

    Reference counting memory management.  This goes along with
    the rest of the DOM.  An argument can certainly be made
    for a non-reference counted C++ DOM, but whatever is chosen,
    strings need to match everything else.

    Efficient const return of stored values.  This means either
    Java style immutable strings, or very efficient cloning
    of mutable strings.

    Easy porting of existing Java string code to C++.

And so, here we are with DOMString.

  -- Andy

Re: cloning DOMString's on assignment in Xerces-C

Reply via email to