|
Hi
Markus,
To be
clear, the fix I created for the IDOM was to recycle memory once a node or
string is no longer needed. To know when a node is no longer needed
I used the original DOM interface, but have them wrapping up the IDOM as the
implementation. IDOM performance is maintained, but ease of use is greatly
improved. Without using the DOM handles to know when an IDOM node is in
use or not, application code will be drawn into explicitly stating when a node
is no longer needed and can be recycled, which is yet another thing to be
documented and to for application developers to get wrong and suffer
consequences for.
If you
love and use the IDOM for its performance, you want the memory problem
fixed so that it is really fixed, not a workaround that only works if your
application does everything right, then you will love what I have done with
combining DOM classes as handles, and IDOM classes as
bodies.
If
what you love is working with pointers instead of with objects, please let me
know why.
One
thing I have found harder with objects vs.. pointers is down casting from node
to derived objects like element. The syntax is a bit cleaner with
pointers; e.g.:
DOM_Node node =
...
DOM_Element elem = (const
DOM_Element&)node;
vs:
IDOM_Node* node =
..
IDOM_Element* elem =
(IDOM_Element*)node;
It is
easy to forget to add the const in the first case, and is somewhat non-intuitive
because slicing can happen, though it is not problem in this
case.
To
solve this problem I have thought of adding overloaded constructors and
assignment operators that take a DOM_Node to DOM_Node derived classes like
DOM_Element. Thus the first example becomes:
DOM_Node node =
...
DOM_Element elem =
node;
Not
only is this code more succinct, but it is safer, as the overloaded constructor
and assignment operator can check for node compatibility via the getNodeType
call.
Again,
please let me know what other aspects of points make things easier for
you.
>
Hope your fix has no effects on thread-safe-ness!
No
affect whatsoever.
Lenny
Hi
Lenny,
I hope your
fix of the IDOM memory problem goes into the next official release. But I use
and love the IDOM interface.
It's
really easier for an old C++ programmer like me! And I use IDOM cause of
its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!
Markus
Hi
Markus,
The
memory management problem solved by recycling no longer used nodes and
strings. The only clean way I know to know when nodes and strings are
being used is to use the handle/body pattern, which is what is used by the
original DOM. What I have done is use the original DOM handles and the
IDOM implementation, but fixed the IDOM memory problem.
Lenny
If the
memory management problem is solved, I prefer IDOM!!!
Hi everyone,
I've reviewed Andy's design objective of
IDOM, Lenny's view of old DOM and his proposal of redesign, and some
users feedback. Here is a "quick" summary and I would
like to call for a VOTE about the fate of these two
interfaces.
1.0 Objective
==========
1. Define the strategy of Xerces-C++ public
DOM interface. Decide which one to
keep, old DOM interface or new IDOM interface
2.0 Motivation
===========
1. As a long term strategy, Xerces-C++ shouldn't
define two W3C DOM interfaces which simply confuses users.
=> We've already got many
users' questions about what the difference, which one to use ...
etc.
2. With limited resource, we should focus our
development on ONE stream, no more duplicate effort
=> New DOM Level 3
development should be done on one interface, not both.
=> No more dual maintenance:
two set of samples (e.g. DOMPrint vs IDOMPrint), two parsers (DOMParser
vs IDOMParser)
=> To encourage more users
to develop DOM application AND implementation based on this
binding.
=> Such binding should just
define a set of abstract base classes (similar to JAVA
interface) where no implementation model is
assumed
3.0 History
=========
'DOM' was the initial "W3C DOM interface"
developed by Xerces-C++. However the performance of its
implementation is not quite satisfactory.
Last year, Andy
Heninger came up with a new design with faster performance,
and such implementation came with a new set of interface
=> 'IDOM'.
Currently both 'DOM' and 'IDOM' are shipped with
Xerces-C++. 'IDOM' is claimed as experimental (like a
prototype) and is subject to change.
More information can be
found in : http://xml.apache.org/xerces-c/program.html
4.0 IDOM
=========
4.1 Interface
==========
4.1.1 Features of IDOM Interface
--------------------------------------------------
e.g. virtual IDOM_Element*
IDOM_Document::createElement(const XMLCh* tagName) = 0;
1. Define as abstract
base classes
2. Use normal
C++ pointers.
=> So that abstract base class is possible.
=> Make it more C++ like.
Less Java like.
4.1.2 Pros and Cons of IDOM Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that correspond to the W3C DOM
interfaces
=> Can be recommended
as Apache DOM C++ Binding
=> More standard like, no
implementation assumed as they are just abstract interfaces using pure
virtual functions
2. (Depends on users' preference)
- someone prefers C++ like
style
Cons:
1. IDOM_XXX - weird prefix
'I'
Solution:
-
Proposed to rename to DOMXXXX which also matches
the DOM Level 3 naming convention
2. (Depends on users' preference)
- someone does not like
pointers, and wants Java-like interface for ease to use, ease to learn
and ease to port (from Java).
3. As the old DOM interface has been around for a
long time, majority of current Xerces-C++ still uses the old DOM
interface, significant migration impact
Solution:
- Announce
the deprecation of old DOM interface for a couple of releases before
removal
4.2 Implementation
===============
4.2.1
Features of IDOM Implementation
-----------------------------------------------------------
1. Use an independent
storage allocator per document. The advantage here is that allocation
would require no synchronization
=> Fast, good scalability,
reduced memory footprint
2. Use plain, null-terminated (XMLCh *) utf-16 strings.
=> No DOMString
class overhead which is another performance contributor that
makes IDOM faster
4.2.2 Downside of IDOM Implementation
-------------------------------------------------------------
1. Manual memory management
- If document comes from
parser, then parser owns the document. If document comes from
DOMImplementation, then users are responsible to delete it.
Solution:
- Provide a
means of disassociating a document from the parser
- Add a
function "Node::release()", similar to the idea of "Range::detach",
which allows users to indicate the release of the Node.
- From C++ Binding abstract interface perspective,
it's up to implementation how to handle this "release()"
function.
- With Xerces-C++ IDOM implementation, the release()
function will delete the 'this' pointer if it is a document, else
no-op.
2. Memory retained until the document is
deleted.
- If you change the value of an
attribute or call removeNode many times, the memory of
the old value is not deallocated for reuse and the document grows and
grows
Solution:
- This in
fact is a tradeoff for the fast performance offered by
independent storage allocator.
-
There is no
immediate good solution in place
5.0 old DOM
==========
5.1 Interface
==========
5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element
DOM_Document::createElement(const DOMString tagName);
1. Use smart pointers -
Java-like
5.1.2 Pros and Cons of old DOM
Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable name
2. (Depends on users'
preference)
- someone wants Java-like
interface for ease to use, ease to learn and ease to port (from
Java).
3. Not that many users have migrated to IDOM yet, so migration
impact is minimal.
Cons:
1. Not abstract base class
- Cannot be recommended as Apache DOM C++
Binding
- Implementation (smart pointer
indirection) is assumed
Solution:
- This
in fact is a tradeoff for the ease of use of smart pointer
design
- No
solution.
2. (Depends on users' preference)
- someone wants C++-like as
this is C++ interface
5.2 Implementation
===============
5.2.1 Features of old DOM
Implementation
----------------------------------------------------------------
1. Automatic memory
management
- Memory is
released when there is no more handles pointing to
it
- Use reference count to keep
track of handles
2. Use thread-safe DOMString class
5.2.2 Downside of old DOM
Implementation
--------------------------------------------------------------------
1. Performance is slow
- Memory management is the
biggest time consumer, and a lot of memory footprint.
- There are a whole lot of
blocks allocated when creating a document and then freed when finished
with it. Each and every node requires at least one and sometimes several
separately allocated blocks. DOMString take three. It adds
up.
Solution:
- Lenny
suggests to use IDOM interface internally in
DOM implementation, patch in Bugzilla
5967
- Then the
performance benefits of IDOM is gained but the memory retained problem in
IDOM implementation still remains to address.
- And
internally, we will have dual interface maintenance model as
IDOM interface is then used by DOM internally.
Vote Question:
============
I would like to call for a
vote:
==> Which INTERFACE
should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===
Note:
1. The question is asking which "interface" to
be officially supported. Once the choice of interface is
chosen, we can discuss how to solve the downside of implementation as
the next topic.
2. The one being voted will become the ONLY
Xerces-C++ supported public W3C DOM Interface, and is where the DOM
Level 3 being implemented.
3. The API of the other interface will be
deprecated. And its samples, and associated Parser
will eventually be removed from the distribution
|