RE: PROPOSAL: DOMString

roddey 21 Dec 1999 00:55:30 -0000

Well, I'm not going to argue with you about whether DOMString is a good
implementation or not, perhaps its not. I didn't write it and I don't know.
The fact that you don't get null terminated strings out of DOMString is an
implementation issue with DOMString, not a good argument necessarily for
using something else that might fit the bill even less.

I would imagine that Andy, who did our DOM, could tick off about 10 to 100
reasons why using the standard C++ string class would actually make things
no better, if not worse. He is out for the Xmas holidays now, so we'll have
to wait for his reply. But the DOM is a pretty ridiculously thick subject,
and unless you've implemented it yourself, you shouldn't assume that
anything is as simple as it seems. By the time you glomed enough stuff onto
the standard string class, you might end up with an even worse performing
and worse looking solution. I'm not saying don't explore the option, but
don't assume that you can just plop something else in there and its going
to meet the thorny issues that DOM raises. Fixing the performance and
architecture of DOMString could still end up being far less onerous a task.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



[EMAIL PROTECTED] on 12/20/99 03:43:27 PM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  RE: PROPOSAL: DOMString




I disagree.  By having separate interfaces for strings for DOM and SAX, we
run into all sorts of performance hits.  For example, in Xalan, we have a
DOM that we want to serialize.  We have classes like FormatterToHTML and
FormatterToXML, all derived from SAX's DocumentHandler.  Methods in the
DocumentHandler interface, like startElement and endElement are declared to
take null-terminated XMLCh*.  But if I try to access the DOM's rawBuffer(),
I'm not guaranteed a null-terminated string.  To get one, I must make a
temporary copy of the string.  Sure, you may have saved some time doing
substring someplace, but at what cost?  I'm going to need to create a lot
of temporary strings, leading to increased memory fragmentation, etc.  The
same thing happens if I want to use the util/TextOutputStream, since it
expects null-terminated XMLCh*'s.

Another DOMString "optimization" is demonstrated by this code:

DOMString x(100);

for (int i=0; i<100; i++)
    x.appendData(... calc some characterter..);

Since DOMString doesn't have an appendData() overload for XMLCh, this will
create a temporary DOMString for each character we're appending.  This is
bad enough.  But then I see that the appendData(DOMString) allocates only
enough memory as needed, so looping like this would cause numerous
reallocations unless we pre-allocate the size of the destination DOMString,
as shown here.

But what does this code really do?  The first time through the loop,
appendData() sees that we're appending to an empty DOMString, so it just
swaps the internal implementations, throws away the pre-allocated string,
and continues then to increase the size of the DOMString one character
every time through the loop.  I'm sure this saved someone some time in some
benchmark, but not in my code.

The advantage of a standard string class is that it has been validated
across a wide variety of uses and has reasonable performance everywhere.

As it is now, we have some API's taking DOMString, some null-terminated
XMLCh* and some non-terminated XMLCh* with an additional length param.
This can't be good.  What if I want to input from SAX, build a DOM tree and
then output via TextOutputStream?  Watch the data:  I'll start with XMLCh*,
make a copy of into DOMString's and then make another copy of them to get
null-terminated XMLCh*'s for output.  Even without transcoding, this still
is inefficient.  That Xalan/C++ spends about 40% of its non I/O time in the
DOMString ctor is further evidence.


-Rob

Dean Roddey wrote:

>The DOM cannot use a raw string. The DOM has many requirements for
>substringing and reference counting, which wouldn't be very practical with
>a raw string, and I'm not sure that the standard library strings (even if
>there were not other practical encumberances to using them) would
>sufficiently meet those needs. The DOM string will give you a pointer to
>the raw XMLCh buffer, which lets everyone get to it in the most
fundamental
>form so that everyone can get it to the desired representation with as
>little overhead as possible (on agregate.)
RE: PROPOSAL: DOMString

Reply via email to