Curt Arnold wrote
>I've noticed in some places string assignments first clone the right hand
>side and in other places they don't. I don't know if this is intentional,
>an artifact of the derivation from XML4J, or just an unnecessary bit of
>overhead. If someone could enlighten me, I'd appreciate it.
> For example: [from the constructors and clone operator for nodes]
Here is what is happening:
DOMStrings can be modified.
If a string field of a node is being assigned with a string of
unknown origin, cloning the string first prevents subsequent
changes in the original string from being seen in the cloned
copy.
If some field of a node is being set to a string that known to
never change, it can simply be assigned. Sources that are a
constant field of another node are safe to use uncloned.
Cloning is fairly efficient; only a new string handle object
needs to be created The data itself will be shared (until such time
as one or the other of the strings is modified.) And string handles
have a custom operator new(), in the interest of speed.
Dave Bertoni wrote
> The meta issue here is that I think this is a very confusing
implementation
> for C++ programmers -- most would think that a DOMString is more like the
> typical C++ string class (std::string, for example), rather than a Java
> StringBuffer. I predict this will be a frequent source of
> very-hard-to-find bugs. Imagine geting a value from a node and
> inadvertantly modifying that string:
>
> DOMString theNodeName = node.getNodeName();
>
> theNodeName += "foo";
>
> I've now changed the name of the node.
No. The node.getNodeName() returns a clone of the string. In general,
a node's properties will not be affected by any changes to strings
returned from any getter function.
In the time that this code has been out as XML4C there have
indeed been questions about how strings and string cloning work.
But there really haven't been all that many of them, and there
have been no hard-to-track-down problems from incorrect string
usage that have made it back to me. (I've been lucky - sending
this note is almost guaranteed to land one in my lap tomorrow
morning.)
I really would have preferred to use some preexisting string
package rather than invent yet another new one. If we can
find something standard that does the job, I wouldn't mind
losing DOMString, although there would no doubt be a squawk
from the existing user base.
Here were my requirements at the time we went with DOMString:
No std library dependencies. There are still important platforms
(some from IBM, unfortunately) that do not support it.
16 bit chars, not wchar_t, which is sometimes 16, sometimes
32, depending on the platform. (DOM memory usage is
already a big sore point)
Reference counting memory management. This goes along with
the rest of the DOM. An argument can certainly be made
for a non-reference counted C++ DOM, but whatever is chosen,
strings need to match everything else.
Efficient const return of stored values. This means either
Java style immutable strings, or very efficient cloning
of mutable strings.
Easy porting of existing Java string code to C++.
And so, here we are with DOMString.
-- Andy