|
Hi
Markus,
Thank you
very much for the insight.
Note that
simply accessing the IDOM implementation via handles does not affect its thread
safety-ness, thus your application is safe.
if (pm_Element)
pm_Element->getAttribute(...);
How can I do this with
references?
You do it with the current handles like
this:
if (!pm_Element.isNull())
pm_Element.getAttribute(...);
Adding an int
operator to DOM_Node would allow even more friendly syntax;
e.g.
if (pm_Element)
pm_Element.getAttribute(...);
This could be
easily added.
In fact, an
-> operators could be added to the DOM_Node classes and get
this:
if (pm_Element)
pm_Element->getAttribute(...);
This is now exactly what you started out with, thus is completely
backward compatible with your current use of the
IDOM.
XMLCh* are easier
to handle as DOMString-Objects in ATL : CComBSTR cBstr =
pm_Element->getAttribute(...);
Good point,
the current DOMString class does not have an XMLCh* operator, which if it did
would solve your problem. I pretty much gutted the original
DOMString class to make it a simple wrapper around an XMLCh* returned from IDOM
implementations, in lieu of suffering the costs of a the cross document
string management of the original DOM. As far as I can tell the only
reason the original DOMString did not have an XMLCh* operator was because there
was no guarantee that its internal XMLCh* was null terminated; well, that
guarantee does now exist and the operator can be added -- I will do that.
So your example remains:
CComBSTR cBstr =
pm_Element->getAttribute(...);
Note that
string classes are convenient way to perform various operations on a string
without using the static (read functional) methods provided by XMLString.
I even implemented COW (copy on write) behavior in the new DOMString class,
so that you can feel free to modify a string returned from a node without having
to manually make a copy.
If folks
don't find the DOMString wrapper to be that important, that frees me up to
simplify the handle classes and address one of Tinny's concerns. Tinny
pointed out that while the new design hides dual interfaces (DOM and IDOM) from
users, it does not hide them from DOM developers; as DOM 3 support is
added, each interface change would have to be made to both DOM and IDOM
classes. The only reason I went with complete interface replication
instead of simple smart pointers for the handle classes was to be able to
translate XMLCh pointers returned from IDOM nodes into DOMStrings. If I am
allowed to get rid of DOMString altogether I can make the handle classes simple
smart pointers that do not replicate IDOM interfaces, and thus the duplication
of effort is eliminated.
Lenny
-----Original
Message----- From: Markus Fellner
[mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 6:17
PM To: [EMAIL PROTECTED];
[EMAIL PROTECTED] Subject: AW: Call for Vote: which one to
be the Xerces-C++ public supported W3C DOM interface
O.k the
main reaseon for my IDOM flirtation is...
I've chosen
IDOM cause of its thread-safeness. And now I have several
thousands lines of code using IDOM interface.
Some other
reasons are...
I have many
IDOM_Element* members (pm_Elem) in my classes. After parsing
they will be assigned one time and than many times checked if they are
really assigned and used for reading and writing
attributes.
if
(pm_Element)
pm_Element->getAttribute(...);
How can I do this with references?
XMLCh* are easier to handle as DOMString-Objects in
ATL : CComBSTR cBstr =
pm_Element->getAttribute(...);
...
Sorry for my
short answer. I go on holiday tomorrow and i have to pack
up!
I'm
back in 2 weeks and looking forward to see the results of this
voting.
It's a
pitty to go during a hot discussion on this list.
Markus
Hi
Markus,
To be
clear, the fix I created for the IDOM was to recycle memory once a node or
string is no longer needed. To know when a node is no longer
needed I used the original DOM interface, but have them wrapping up the IDOM
as the implementation. IDOM performance is maintained, but ease of use
is greatly improved. Without using the DOM handles to know when an
IDOM node is in use or not, application code will be drawn into explicitly
stating when a node is no longer needed and can be recycled, which is yet
another thing to be documented and to for application developers to get
wrong and suffer consequences for.
If you
love and use the IDOM for its performance, you want the memory problem
fixed so that it is really fixed, not a workaround that only works if
your application does everything right, then you will love what I have
done with combining DOM classes as handles, and IDOM classes as
bodies.
If what
you love is working with pointers instead of with objects, please let me
know why.
One thing
I have found harder with objects vs.. pointers is down casting from node to
derived objects like element. The syntax is a bit cleaner with
pointers; e.g.:
DOM_Node node =
...
DOM_Element elem = (const
DOM_Element&)node;
vs:
IDOM_Node* node =
..
IDOM_Element* elem =
(IDOM_Element*)node;
It is
easy to forget to add the const in the first case, and is somewhat
non-intuitive because slicing can happen, though it is not problem in
this case.
To solve
this problem I have thought of adding overloaded constructors and assignment
operators that take a DOM_Node to DOM_Node derived classes like
DOM_Element. Thus the first example becomes:
DOM_Node node =
...
DOM_Element elem =
node;
Not only
is this code more succinct, but it is safer, as the overloaded constructor
and assignment operator can check for node compatibility via the getNodeType
call.
Again,
please let me know what other aspects of points make things easier for
you.
> Hope
your fix has no effects on thread-safe-ness!
No affect
whatsoever.
Lenny
Hi
Lenny,
I hope
your fix of the IDOM memory problem goes into the next official release.
But I use and love the IDOM interface.
It's
really easier for an old C++ programmer like me! And I use IDOM cause
of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!
Markus
Hi
Markus,
The
memory management problem solved by recycling no longer used nodes and
strings. The only clean way I know to know when nodes and strings
are being used is to use the handle/body pattern, which is what is used
by the original DOM. What I have done is use the original DOM
handles and the IDOM implementation, but fixed the IDOM memory
problem.
Lenny
If
the memory management problem is solved, I prefer
IDOM!!!
Hi everyone,
I've reviewed Andy's design objective of
IDOM, Lenny's view of old DOM and his proposal of redesign, and some
users feedback. Here is a "quick" summary and I
would like to call for a VOTE about the fate of these two
interfaces.
1.0 Objective
==========
1. Define the strategy of Xerces-C++
public DOM interface. Decide
which one to keep, old DOM interface or new IDOM
interface
2.0 Motivation
===========
1. As a long term strategy, Xerces-C++
shouldn't define two W3C DOM interfaces which simply confuses
users.
=> We've already got
many users' questions about what the difference, which one to use
... etc.
2. With limited resource, we should focus our
development on ONE stream, no more duplicate effort
=> New DOM Level 3
development should be done on one interface, not both.
=> No more dual
maintenance: two set of samples (e.g. DOMPrint vs IDOMPrint), two
parsers (DOMParser vs IDOMParser)
=> To encourage more
users to develop DOM application AND implementation based on this
binding.
=> Such binding should
just define a set of abstract base classes (similar to JAVA
interface) where no implementation model is
assumed
3.0 History
=========
'DOM' was the initial "W3C DOM interface"
developed by Xerces-C++. However the performance of its
implementation is not quite satisfactory.
Last year,
Andy Heninger came up with a new design with faster
performance, and such implementation came with a new set of
interface => 'IDOM'.
Currently both 'DOM' and 'IDOM' are shipped
with Xerces-C++. 'IDOM' is claimed as experimental (like
a prototype) and is subject to change.
More information can
be found in : http://xml.apache.org/xerces-c/program.html
4.0 IDOM
=========
4.1 Interface
==========
4.1.1 Features of IDOM Interface
--------------------------------------------------
e.g. virtual IDOM_Element*
IDOM_Document::createElement(const XMLCh* tagName) = 0;
1. Define as abstract base classes
2. Use
normal C++ pointers.
=> So that abstract base class is
possible.
=> Make it more C++
like. Less Java like.
4.1.2 Pros and Cons of IDOM
Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that correspond to the W3C DOM
interfaces
=> Can be recommended
as Apache DOM C++ Binding
=> More standard like,
no implementation assumed as they are just abstract interfaces using
pure virtual functions
2. (Depends on users' preference)
- someone prefers C++ like
style
Cons:
1. IDOM_XXX - weird prefix
'I'
Solution:
-
Proposed to rename to DOMXXXX which also
matches the DOM Level 3 naming convention
2. (Depends on users' preference)
- someone does not like
pointers, and wants Java-like interface for ease to use, ease to
learn and ease to port (from Java).
3. As the old DOM interface has been around
for a long time, majority of current Xerces-C++ still uses the old
DOM interface, significant migration impact
Solution:
-
Announce the deprecation of old DOM interface for a couple of
releases before removal
4.2 Implementation
===============
4.2.1
Features of IDOM Implementation
-----------------------------------------------------------
1. Use an
independent storage allocator per document. The advantage here is
that allocation would require no synchronization
=> Fast, good
scalability, reduced memory footprint
2. Use plain, null-terminated (XMLCh
*) utf-16 strings.
=> No DOMString
class overhead which is another performance contributor
that makes IDOM faster
4.2.2 Downside of IDOM
Implementation
-------------------------------------------------------------
1. Manual memory management
- If document comes from
parser, then parser owns the document. If document comes from
DOMImplementation, then users are responsible to delete
it.
Solution:
-
Provide a means of disassociating a document from the
parser
- Add a
function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the
Node.
- From C++ Binding abstract interface
perspective, it's up to implementation how to handle this
"release()" function.
- With Xerces-C++ IDOM implementation, the
release() function will delete the 'this' pointer if it is a
document, else no-op.
2. Memory retained until the document is
deleted.
- If you change the value
of an attribute or call removeNode many times, the
memory of the old value is not deallocated for reuse and the
document grows and grows
Solution:
- This
in fact is a tradeoff for the fast performance offered by
independent storage allocator.
-
There is no
immediate good solution in place
5.0 old DOM
==========
5.1 Interface
==========
5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element
DOM_Document::createElement(const DOMString tagName);
1. Use smart pointers -
Java-like
5.1.2 Pros and Cons of old DOM
Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable name
2. (Depends on users'
preference)
- someone wants
Java-like interface for ease to use, ease to learn and ease to port
(from Java).
3. Not that many users have migrated to IDOM yet,
so migration impact is minimal.
Cons:
1. Not abstract base class
- Cannot be recommended as Apache DOM
C++ Binding
- Implementation (smart pointer
indirection) is assumed
Solution:
- This in fact is a tradeoff for the ease of use of smart
pointer design
- No
solution.
2. (Depends on users' preference)
- someone wants C++-like as
this is C++ interface
5.2 Implementation
===============
5.2.1 Features of old DOM
Implementation
----------------------------------------------------------------
1. Automatic memory
management
- Memory
is released when there is no more handles pointing to
it
- Use reference count to
keep track of handles
2. Use thread-safe DOMString
class
5.2.2 Downside of old DOM
Implementation
--------------------------------------------------------------------
1. Performance is slow
- Memory management is the
biggest time consumer, and a lot of memory footprint.
- There are a whole lot of
blocks allocated when creating a document and then freed when
finished with it. Each and every node requires at least one and
sometimes several separately allocated blocks. DOMString take three.
It adds up.
Solution:
- Lenny
suggests to use IDOM interface internally in
DOM implementation, patch in
Bugzilla 5967
- Then
the performance benefits of IDOM is gained but the memory retained problem
in IDOM implementation still remains to address.
- And
internally, we will have dual interface maintenance model
as IDOM interface is then used by DOM
internally.
Vote Question:
============
I would like to call for a
vote:
==> Which
INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===
Note:
1. The question is asking which "interface" to
be officially supported. Once the choice of interface is
chosen, we can discuss how to solve the downside of implementation
as the next topic.
2. The one being voted will become the ONLY
Xerces-C++ supported public W3C DOM Interface, and is where the DOM
Level 3 being implemented.
3. The API of the other interface will be
deprecated. And its samples, and associated Parser
will eventually be removed from the distribution
|