Go with IDOM, speed is key, but rename it.  Fix the memory rentention problem (which it looks like Lenny is already doing).
 
I do like Joseph Kesselman's suggestion to have a wrapper around the IDOM so it behaves like the old DOM interface.
 
Just my two cents.
 
David Schulze
Software Engineer
DeLorme Mapping
[EMAIL PROTECTED]
 
-----Original Message-----
From: Tinny Ng [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 29, 2002 11:08 AM
To: [EMAIL PROTECTED]
Subject: Call for Vote: which one to be the Xerces-C++ public supported W3C DOM interface

Hi everyone,
 
I've reviewed Andy's design objective of IDOM, Lenny's view of old DOM and his proposal of redesign, and some users feedback.   Here is a "quick" summary and I would like to call for a VOTE about the fate of these two interfaces.
 
1.0 Objective
==========
1.  Define the strategy of Xerces-C++ public DOM interface.  Decide which one to keep, old DOM interface or new IDOM interface
 
 
2.0 Motivation
===========
1. As a long term strategy, Xerces-C++ shouldn't define two W3C DOM interfaces which simply confuses users.  
    => We've already got many users' questions about what the difference, which one to use ... etc.
2. With limited resource, we should focus our development on ONE stream, no more duplicate effort
    => New DOM Level 3 development should be done on one interface, not both.
    => No more dual maintenance: two set of samples (e.g. DOMPrint vs IDOMPrint), two parsers (DOMParser vs IDOMParser)
3. To better place Apache Xerces-C++ in the market, we should have our Apache Recommended DOM C++ Binding in http://www.w3.org/DOM/Bindings
    => To encourage more users to develop DOM application AND implementation based on this binding.
    => Such binding should just define a set of abstract base classes (similar to JAVA interface) where no implementation model is assumed
 
 
3.0 History
=========
'DOM' was the initial "W3C DOM interface" developed by Xerces-C++.  However the performance of its implementation is not quite satisfactory.

Last year, Andy Heninger came up with a new design with faster performance, and such implementation came with a new set of interface => 'IDOM'.
 
Currently both 'DOM' and 'IDOM' are shipped with Xerces-C++.  'IDOM' is claimed as experimental (like a prototype) and is subject to change.

More information can be found in :
http://xml.apache.org/xerces-c/program.html
 
 
 
4.0 IDOM
=========
4.1 Interface
==========
 
4.1.1 Features of IDOM Interface
--------------------------------------------------
e.g. virtual IDOM_Element* IDOM_Document::createElement(const XMLCh* tagName) = 0;
 
1. Define as abstract base classes
2. Use normal C++ pointers.
    => So that abstract base class is possible.
    => Make it more C++ like. Less Java like.
 
 
4.1.2 Pros and Cons of IDOM Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that correspond to the W3C DOM interfaces
    => Can be recommended as Apache DOM C++ Binding
    => More standard like, no implementation assumed as they are just abstract interfaces using pure virtual functions
2. (Depends on users' preference)
    - someone prefers C++ like style
 
Cons:
1. IDOM_XXX - weird prefix 'I'
    Solution:
        - Proposed to rename to DOMXXXX which also matches the DOM Level 3 naming convention
2. (Depends on users' preference)
    - someone does not like pointers, and wants Java-like interface for ease to use, ease to learn and ease to port (from Java).
3. As the old DOM interface has been around for a long time, majority of current Xerces-C++ still uses the old DOM interface, significant migration impact
    Solution:
        - Announce the deprecation of old DOM interface for a couple of releases before removal
   
4.2 Implementation
===============
4.2.1 Features of IDOM Implementation
-----------------------------------------------------------
1. Use an independent storage allocator per document. The advantage here is that allocation would require no synchronization
    => Fast, good scalability, reduced memory footprint
2. Use plain, null-terminated (XMLCh *) utf-16 strings.
    => No DOMString class overhead which is another performance contributor that makes IDOM faster
 
 
4.2.2 Downside of IDOM Implementation
-------------------------------------------------------------
1. Manual memory management
    - If document comes from parser, then parser owns the document.  If document comes from DOMImplementation, then users are responsible to delete it.
    Solution:
        - Provide a means of disassociating a document from the parser
        - Add a function "Node::release()", similar to the idea of "Range::detach", which allows users to indicate the release of the Node. 
            - From C++ Binding abstract interface perspective, it's up to implementation how to handle this "release()" function.
            - With Xerces-C++ IDOM implementation, the release() function will delete the 'this' pointer if it is a document, else no-op.
2. Memory retained until the document is deleted.
    - If you change the value of an attribute or call removeNode many times,  the memory of the old value is not deallocated for reuse and the document grows and grows
    Solution:
        - This in fact is a tradeoff for the fast performance offered by independent storage allocator. 
        - There is no immediate good solution in place
 
 
5.0 old DOM
==========
5.1 Interface
========== 
 
5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element DOM_Document::createElement(const DOMString tagName);
 
1. Use smart pointers - Java-like
 
 
5.1.2 Pros and Cons of old DOM Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable name
2. (Depends on users' preference)
    - someone wants Java-like interface for ease to use, ease to learn and ease to port (from Java).
3. Not that many users have migrated to IDOM yet, so migration impact is minimal.
 
Cons:
1. Not abstract base class
    - Cannot be recommended as Apache DOM C++ Binding
    - Implementation (smart pointer indirection) is assumed
    Solution:
        - This in fact is a tradeoff for the ease of use of smart pointer design
        - No solution.
2. (Depends on users' preference)
    - someone wants C++-like as this is C++ interface
 
   
5.2 Implementation
===============
5.2.1 Features of old DOM Implementation
----------------------------------------------------------------
1. Automatic memory management
    - Memory is released when there is no more handles pointing to it
    - Use reference count to keep track of handles
2. Use thread-safe DOMString class
 
 
5.2.2 Downside of old DOM Implementation
--------------------------------------------------------------------
1. Performance is slow
    - Memory management is the biggest time consumer, and a lot of memory footprint.
    - There are a whole lot of blocks allocated when creating a document and then freed when finished with it. Each and every node requires at least one and sometimes several separately allocated blocks. DOMString take three. It adds up.
    Solution:
        - Lenny suggests to use IDOM interface internally in DOM implementation, patch in Bugzilla 5967
        - Then the performance benefits of IDOM is gained but the memory retained problem in IDOM implementation still remains to address.  
        - And internally, we will have dual interface maintenance model as IDOM interface is then used by DOM internally.
 
 
Vote Question:
============
I would like to call for a vote:
 
    ==>  Which INTERFACE should be the Xerces-C++ public supported W3C DOM Interface, DOM or IDOM? <===
 
Note: 
1. The question is asking which "interface" to be officially supported.  Once the choice of interface is chosen, we can discuss how to solve the downside of implementation as the next topic.
2. The one being voted will become the ONLY Xerces-C++ supported public W3C DOM Interface, and is where the DOM Level 3 being implemented.
3. The API of the other interface will be deprecated.  And its samples, and associated Parser will eventually be removed from the distribution
 

Reply via email to