-----Original Message-----
From: Saraogi, Vikas [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 17, 2004 10:45 AM
To: '[EMAIL PROTECTED]'
Subject: RE: Size of a DOM document?Is it possible to find out how much the node is consuming in memory? I don't want to serialize each node before adding them to a document. I don't want to get an exact size of the document, just an approximation.What I am intending to do is somehow find the size of a node, serialize it. So I know how much it consume in memory, how much in disk. Then I find the size of each successive nodes. If the memory size is less then just assume it to take the maximum disk size. If its memory size is more than the max., then serialize it and then make it max. Following pseudo code will help you understand what I intend to do:max_memory_size = get_node_memory_size(node_1);max_disk_size = get_node_disk_size(node_1);...do remaining nodesmemory_size = get_node_memory_size(node_x);if memory_size < max_memory_sizedisk_size = max_disk_size;elsemax_disk_size = get_node_disk_size(node_x);...loopNow, how do I implement get_node_memory_size?Thank you,Vikas-----Original Message-----
From: Jesse Pelton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 11, 2004 11:56 AM
To: '[EMAIL PROTECTED]'
Subject: RE: Size of a DOM document?Presumably the limit is on the size of the serialized document. I doubt there's a reliable way to determine that without performing the serialization. Among the vagaries that you'd have to deal with if you were to track the size of the document as you add to it are character entity representation and text encoding. For instance, a commonly used HTML entity is the non-breaking space, 0xA0. This would be represented as the character entity   in the serialized output, at least with 8-bit encodings. So a single character in the DOM takes 6 bytes to serialize. I think it's possible that it would take 2 bytes in a 16-bit encoding.Maybe you could serialize each node into a throw-away buffer as you add it to the DOM, note how many bytes were written, and compare that plus the bytes written so far to your limit. You'd still have to serialize with each addition, but at least you wouldn't have to serialize the whole document to find out if you just went over the limit. Of course, you'd have to make sure that you use the same encoding for the incremental serializations as you do when you actually serialize the final document.-----Original Message-----
From: Saraogi, Vikas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 11, 2004 11:24 AM
To: [EMAIL PROTECTED]
Subject: Size of a DOM document?Hi y'all,
I am creating a DOM document and adding various elements in it. The DOM document can't exceed a set memory size. Before adding a new element to the document, I need to know the size of the document. If the size exceeds the limit then I will finish that document, serialize it and then start with a new DOM document. How can I find out the size of the DOM Document? I am using Xerces C++ 1.6.0 version.
Thanks for your replies.
Vikas
<copyright info snipped>
************************************************************************** The information transmitted herewith is sensitive information intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Title: Message
Speaking of copyright info, it looks to me like the email policy included
at the bottom of your posts precludes list readers from "reviewing" your
messages. Anyone at all could participate in the list; I think it would be hard
to argue that you chose each individual subscriber as recipient. At the very
least, such language is inconsistent with the spirit and intent of lists such as
this, which are part of cooperative open source development efforts.
(Encryption, not words, is the way to protect sensitive information from
the wrong recipient. CapitalOne may believe it's protecting itself with this
language, but concrete actions like using encryption are far more useful, and
more credible if you wind up in court.) You might ask whether it can be omitted
from posts to lists like this, being careful to point out that you may find
yourself without support if company policy prevents you from giving back to the
projects whose work you are using.
Anyway, to the subject at hand. Your approach is interesting, if risky.
It assumes that some useful correlation exists between the size of the node in
memory and its size when serialized. As I've pointed out, due to the vagaries of
encoding, there's no guarantee that this is in fact the case. Namespaces will
only make the situation worse. If you're determined to proceed, the obvious
thing to do would be to scan each element's list of attributes, sum the length
of the name and value of each, add in the element's name, then go through the
same exercise for each child of the element. For text nodes, just use the length
of the text. Depending on your needs, you may need to handle comments, CDATA
sections, and so on. I'm not quite sure how to factor in namespaces if you're
using them.
If you
really are constrained by the size of the serialized document, some sort of
incremental serialization would be ideal. Sort of the opposite of a stream-based
parser, which digests as much data as is available, then waits for more. Perhaps
you could adapt the DOMPrint sample to obtain the next node to write from a
callback that you supply. You wouldn't necessarily construct a DOM in advance of
calling the serializer. Instead, your callback would track what has been
serialized and construct and return the next node. If the serializer decides the
document is as big as permissible, it would return. You'd do whatever needed to
be done with the serialized document, then call the serializer
again.
I'm
just making this up, of course. Maybe someone else will have a better idea. (But
maybe not. XML was not designed to be space-efficient, and developers who use
XML are not used to thinking in terms of size constraints. Usually developers
look elsewhere if they need a compact representation of their
data.)
- Size of a DOM document? Saraogi, Vikas
- RE: Size of a DOM document? Jesse Pelton
- RE: Size of a DOM document? Saraogi, Vikas
- Jesse Pelton
