RE: Size of a DOM document?

Jesse Pelton Tue, 17 Feb 2004 09:21:53 -0800

Title: Message

Speaking of copyright info, it looks to me like the email policy included at the bottom of your posts precludes list readers from "reviewing" your messages. Anyone at all could participate in the list; I think it would be hard to argue that you chose each individual subscriber as recipient. At the very least, such language is inconsistent with the spirit and intent of lists such as this, which are part of cooperative open source development efforts. (Encryption, not words, is the way to protect sensitive information from the wrong recipient. CapitalOne may believe it's protecting itself with this language, but concrete actions like using encryption are far more useful, and more credible if you wind up in court.) You might ask whether it can be omitted from posts to lists like this, being careful to point out that you may find yourself without support if company policy prevents you from giving back to the projects whose work you are using.

Anyway, to the subject at hand. Your approach is interesting, if risky. It assumes that some useful correlation exists between the size of the node in memory and its size when serialized. As I've pointed out, due to the vagaries of encoding, there's no guarantee that this is in fact the case. Namespaces will only make the situation worse. If you're determined to proceed, the obvious thing to do would be to scan each element's list of attributes, sum the length of the name and value of each, add in the element's name, then go through the same exercise for each child of the element. For text nodes, just use the length of the text. Depending on your needs, you may need to handle comments, CDATA sections, and so on. I'm not quite sure how to factor in namespaces if you're using them.

If you really are constrained by the size of the serialized document, some sort of incremental serialization would be ideal. Sort of the opposite of a stream-based parser, which digests as much data as is available, then waits for more. Perhaps you could adapt the DOMPrint sample to obtain the next node to write from a callback that you supply. You wouldn't necessarily construct a DOM in advance of calling the serializer. Instead, your callback would track what has been serialized and construct and return the next node. If the serializer decides the document is as big as permissible, it would return. You'd do whatever needed to be done with the serialized document, then call the serializer again.

I'm just making this up, of course. Maybe someone else will have a better idea. (But maybe not. XML was not designed to be space-efficient, and developers who use XML are not used to thinking in terms of size constraints. Usually developers look elsewhere if they need a compact representation of their data.)

-----Original Message-----
From: Saraogi, Vikas [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 17, 2004 10:45 AM
To: '[EMAIL PROTECTED]'
Subject: RE: Size of a DOM document?

Is it possible to find out how much the node is consuming in memory? I don't want to serialize each node before adding them to a document. I don't want to get an exact size of the document, just an approximation.

What I am intending to do is somehow find the size of a node, serialize it. So I know how much it consume in memory, how much in disk. Then I find the size of each successive nodes. If the memory size is less then just assume it to take the maximum disk size. If its memory size is more than the max., then serialize it and then make it max. Following pseudo code will help you understand what I intend to do:

    max_memory_size = get_node_memory_size(node_1);

    max_disk_size = get_node_disk_size(node_1);

    ...

    do remaining nodes

    memory_size = get_node_memory_size(node_x);

    if memory_size < max_memory_size

        disk_size = max_disk_size;

    else

        max_disk_size = get_node_disk_size(node_x);

    ...

    loop

Now, how do I implement get_node_memory_size?

Thank you,

Vikas

-----Original Message-----
From: Jesse Pelton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 11, 2004 11:56 AM
To: '[EMAIL PROTECTED]'
Subject: RE: Size of a DOM document?

Presumably the limit is on the size of the serialized document. I doubt there's a reliable way to determine that without performing the serialization. Among the vagaries that you'd have to deal with if you were to track the size of the document as you add to it are character entity representation and text encoding. For instance, a commonly used HTML entity is the non-breaking space, 0xA0. This would be represented as the character entity   in the serialized output, at least with 8-bit encodings. So a single character in the DOM takes 6 bytes to serialize. I think it's possible that it would take 2 bytes in a 16-bit encoding.

Maybe you could serialize each node into a throw-away buffer as you add it to the DOM, note how many bytes were written, and compare that plus the bytes written so far to your limit. You'd still have to serialize with each addition, but at least you wouldn't have to serialize the whole document to find out if you just went over the limit. Of course, you'd have to make sure that you use the same encoding for the incremental serializations as you do when you actually serialize the final document.

-----Original Message-----
From: Saraogi, Vikas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 11, 2004 11:24 AM
To: [EMAIL PROTECTED]
Subject: Size of a DOM document?

Hi y'all,

I am creating a DOM document and adding various elements in it. The DOM document can't exceed a set memory size. Before adding a new element to the document, I need to know the size of the document. If the size exceeds the limit then I will finish that document, serialize it and then start with a new DOM document. How can I find out the size of the DOM Document? I am using Xerces C++ 1.6.0 version.

Thanks for your replies.

Vikas


<copyright info snipped>

************************************************************************** The information transmitted herewith is sensitive information intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

RE: Size of a DOM document?

Reply via email to