Re: cloning DOMString's on assignment in Xerces-C

heninger 19 Nov 1999 02:25:16 -0000


Curt Arnold wrote

>I've noticed in some places string assignments first clone the right hand
>side and in other places they don't.  I don't know if this is intentional,
>an artifact of the derivation from XML4J, or just an unnecessary bit of
>overhead.  If someone could enlighten me, I'd appreciate it.

> For example:  [from the constructors and clone operator for nodes]

Here is what is happening:

  DOMStrings can be modified.

  If a string field of a node is being assigned with a string of
  unknown origin, cloning the string first prevents subsequent
  changes in the original string from being seen in the cloned
  copy.

  If some field of a node is being set to a string that known to
  never change, it can simply be assigned.  Sources that are a
  constant field of another node are safe to use uncloned.

  Cloning is fairly efficient; only a new string handle object
  needs to be created  The data itself will be shared (until such time
  as one or the other of the strings is modified.)  And string handles
  have a custom operator new(), in the interest of speed.



Dave Bertoni wrote

> The meta issue here is that I think this is a very confusing
implementation
> for C++ programmers -- most would think that a DOMString is more like the
> typical C++ string class (std::string, for example), rather than a Java
> StringBuffer.  I predict this will be a frequent source of
> very-hard-to-find bugs.  Imagine geting a value from a node and
> inadvertantly modifying that string:
>
>     DOMString  theNodeName = node.getNodeName();
>
>     theNodeName += "foo";
>
> I've now changed the name of the node.

No.  The node.getNodeName() returns a clone of the string. In general,
a node's properties will not be affected by any changes to strings
returned from any getter function.

In the time that this code has been out as XML4C there have
indeed been questions about how strings and string cloning work.
But there really haven't been all that many of them, and there
have been no hard-to-track-down problems from incorrect string
usage that have made it back to me.  (I've been lucky - sending
this note is almost guaranteed to land one in my lap tomorrow
morning.)


I really would have preferred to use some preexisting string
package rather than invent yet another new one.  If we can
find something standard that does the job, I wouldn't mind
losing DOMString, although there would no doubt be a squawk
from the existing user base.

Here were my requirements at the time we went with DOMString:

    No std library dependencies.  There are still important platforms
      (some from IBM, unfortunately) that do not support it.

    16 bit chars, not wchar_t, which is sometimes 16, sometimes
    32, depending on the platform.  (DOM memory usage is
    already a big sore point)

    Reference counting memory management.  This goes along with
    the rest of the DOM.  An argument can certainly be made
    for a non-reference counted C++ DOM, but whatever is chosen,
    strings need to match everything else.

    Efficient const return of stored values.  This means either
    Java style immutable strings, or very efficient cloning
    of mutable strings.

    Easy porting of existing Java string code to C++.

And so, here we are with DOMString.

  -- Andy
Re: cloning DOMString's on assignment in Xerces-C

Reply via email to