|
If the
desire is to maintain only one interface, then I would be of the opinion that we
should nix the DOMString class and use a 'smart pointer' class to wrapper the
internal interfaces. In many cases, people will likely have their own preferred
string class which they use and will immediately convert the value extracted
from the DOM before passing into any other layer of their
code.
If we
keep DOMString around, I would recommend against having a (const XMLCh *)
operator as this can result in some incredibly hard to track errors. Most C++
style guides recommend against implicit conversion operators. Note the lack of
such an operator in the C++ standard library string, i.e.
std::basic_string<T>. Having something like rawBuffer, or XMLCh() would be
clearer and lets one control lifetimes in a much clearer way
(IMHO).
Also,
I would recommend against adding an int operator on the smart pointer class. It
is not that much work to call isNull on the object, and is much clearer from a
readability perspective as well as helps catch silly errors at compile time. If
we must have such an operator then it may be better to add a bool operator
instead of int, as this will likely reduce the number of places where the
implicit conversion operator will be called.
My two
bits...
Samar
Lotia
Hi
Markus,
Thank you
very much for the insight.
Note that
simply accessing the IDOM implementation via handles does not affect its
thread safety-ness, thus your application is safe.
if (pm_Element)
pm_Element->getAttribute(...);
How can I do this
with references?
You do it with the current handles like
this:
if (!pm_Element.isNull())
pm_Element.getAttribute(...);
Adding an
int operator to DOM_Node would allow even more friendly syntax;
e.g.
if (pm_Element)
pm_Element.getAttribute(...);
This could
be easily added.
In fact, an
-> operators could be added to the DOM_Node classes and get
this:
if (pm_Element)
pm_Element->getAttribute(...);
This is now exactly what you started out with, thus is completely
backward compatible with your current use of the
IDOM.
XMLCh* are easier
to handle as DOMString-Objects in ATL : CComBSTR cBstr =
pm_Element->getAttribute(...);
Good point,
the current DOMString class does not have an XMLCh* operator, which if it did
would solve your problem. I pretty much gutted the original
DOMString class to make it a simple wrapper around an XMLCh* returned from
IDOM implementations, in lieu of suffering the costs of a the cross
document string management of the original DOM. As far as I can tell the
only reason the original DOMString did not have an XMLCh* operator was because
there was no guarantee that its internal XMLCh* was null terminated; well,
that guarantee does now exist and the operator can be added -- I will do
that. So your example remains:
CComBSTR cBstr =
pm_Element->getAttribute(...);
Note that
string classes are convenient way to perform various operations on a
string without using the static (read functional) methods provided by
XMLString. I even implemented COW (copy on write) behavior in the
new DOMString class, so that you can feel free to modify a string returned
from a node without having to manually make a copy.
If folks
don't find the DOMString wrapper to be that important, that frees me up to
simplify the handle classes and address one of Tinny's concerns. Tinny
pointed out that while the new design hides dual interfaces (DOM and IDOM)
from users, it does not hide them from DOM developers; as DOM 3
support is added, each interface change would have to be made to both DOM and
IDOM classes. The only reason I went with complete interface replication
instead of simple smart pointers for the handle classes was to be able to
translate XMLCh pointers returned from IDOM nodes into DOMStrings. If I
am allowed to get rid of DOMString altogether I can make the handle classes
simple smart pointers that do not replicate IDOM interfaces, and thus the
duplication of effort is eliminated.
Lenny
-----Original
Message----- From: Markus Fellner
[mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 6:17
PM To: [EMAIL PROTECTED];
[EMAIL PROTECTED] Subject: AW: Call for Vote: which one
to be the Xerces-C++ public supported W3C DOM interface
O.k the
main reaseon for my IDOM flirtation is...
I've
chosen IDOM cause of its thread-safeness. And now I have several
thousands lines of code using IDOM interface.
Some
other reasons are...
I have
many IDOM_Element* members (pm_Elem) in my classes. After
parsing they will be assigned one time and than many times checked if
they are really assigned and used for reading and writing
attributes.
if
(pm_Element)
pm_Element->getAttribute(...);
How can I do this with references?
XMLCh* are easier to handle as DOMString-Objects in
ATL : CComBSTR cBstr =
pm_Element->getAttribute(...);
...
Sorry for my
short answer. I go on holiday tomorrow and i have to pack
up!
I'm
back in 2 weeks and looking forward to see the results of this
voting.
It's a
pitty to go during a hot discussion on this list.
Markus
Hi
Markus,
To be
clear, the fix I created for the IDOM was to recycle memory once a node or
string is no longer needed. To know when a node is no longer
needed I used the original DOM interface, but have them wrapping up the
IDOM as the implementation. IDOM performance is maintained, but ease
of use is greatly improved. Without using the DOM handles to know
when an IDOM node is in use or not, application code will be drawn into
explicitly stating when a node is no longer needed and can be recycled,
which is yet another thing to be documented and to for application
developers to get wrong and suffer consequences for.
If you
love and use the IDOM for its performance, you want the memory problem
fixed so that it is really fixed, not a workaround that only works if
your application does everything right, then you will love what I
have done with combining DOM classes as handles, and IDOM classes as
bodies.
If what
you love is working with pointers instead of with objects, please let me
know why.
One
thing I have found harder with objects vs.. pointers is down casting from
node to derived objects like element. The syntax is a bit cleaner
with pointers; e.g.:
DOM_Node node =
...
DOM_Element elem =
(const DOM_Element&)node;
vs:
IDOM_Node* node =
..
IDOM_Element* elem =
(IDOM_Element*)node;
It is
easy to forget to add the const in the first case, and is somewhat
non-intuitive because slicing can happen, though it is not problem in
this case.
To
solve this problem I have thought of adding overloaded constructors and
assignment operators that take a DOM_Node to DOM_Node derived classes like
DOM_Element. Thus the first example becomes:
DOM_Node node =
...
DOM_Element elem =
node;
Not
only is this code more succinct, but it is safer, as the overloaded
constructor and assignment operator can check for node compatibility via
the getNodeType call.
Again,
please let me know what other aspects of points make things easier for
you.
>
Hope your fix has no effects on thread-safe-ness!
No
affect whatsoever.
Lenny
Hi
Lenny,
I
hope your fix of the IDOM memory problem goes into the next official
release. But I use and love the IDOM interface.
It's
really easier for an old C++ programmer like me! And I use IDOM
cause of its threadsafe properties. Hope your fix has no effects on
thread-safe-ness!
Markus
Hi
Markus,
The
memory management problem solved by recycling no longer used nodes and
strings. The only clean way I know to know when nodes and
strings are being used is to use the handle/body pattern, which is
what is used by the original DOM. What I have done is use the
original DOM handles and the IDOM implementation, but fixed the IDOM
memory problem.
Lenny
If the memory management problem is solved, I prefer
IDOM!!!
Hi everyone,
I've reviewed Andy's design objective
of IDOM, Lenny's view of old DOM and his proposal of redesign, and
some users feedback. Here is a "quick" summary
and I would like to call for a VOTE about the fate of these two
interfaces.
1.0 Objective
==========
1. Define the strategy of Xerces-C++
public DOM interface. Decide
which one to keep, old DOM interface or new IDOM
interface
2.0 Motivation
===========
1. As a long term strategy, Xerces-C++
shouldn't define two W3C DOM interfaces which simply confuses
users.
=> We've already got
many users' questions about what the difference, which one to use
... etc.
2. With limited resource, we should focus
our development on ONE stream, no more duplicate
effort
=> New DOM Level 3
development should be done on one interface, not
both.
=> No more dual
maintenance: two set of samples (e.g. DOMPrint vs IDOMPrint), two
parsers (DOMParser vs IDOMParser)
=> To encourage more
users to develop DOM application AND implementation based on this
binding.
=> Such binding should
just define a set of abstract base classes (similar to JAVA
interface) where no implementation model is
assumed
3.0 History
=========
'DOM' was the initial "W3C DOM
interface" developed by Xerces-C++. However the performance
of its implementation is not quite satisfactory.
Last
year, Andy Heninger came up with a new design with
faster performance, and such implementation came with a new set of
interface => 'IDOM'.
Currently both 'DOM' and 'IDOM' are shipped
with Xerces-C++. 'IDOM' is claimed as experimental
(like a prototype) and is subject to change.
More
information can be found in : http://xml.apache.org/xerces-c/program.html
4.0 IDOM
=========
4.1 Interface
==========
4.1.1 Features of IDOM
Interface
--------------------------------------------------
e.g. virtual IDOM_Element*
IDOM_Document::createElement(const XMLCh* tagName) =
0;
1. Define as abstract base classes
2. Use
normal C++ pointers.
=> So that abstract base class is
possible.
=> Make it more C++
like. Less Java like.
4.1.2 Pros and Cons of IDOM
Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that correspond to the W3C DOM
interfaces
=> Can be recommended
as Apache DOM C++ Binding
=> More standard like,
no implementation assumed as they are just abstract interfaces
using pure virtual functions
2. (Depends on users'
preference)
- someone prefers C++
like style
Cons:
1. IDOM_XXX - weird prefix
'I'
Solution:
-
Proposed to rename to DOMXXXX which also
matches the DOM Level 3 naming convention
2. (Depends on users'
preference)
- someone does not like
pointers, and wants Java-like interface for ease to use, ease to
learn and ease to port (from Java).
3. As the old DOM interface has been around
for a long time, majority of current Xerces-C++ still uses the old
DOM interface, significant migration impact
Solution:
-
Announce the deprecation of old DOM interface for a couple of
releases before removal
4.2 Implementation
===============
4.2.1
Features of IDOM Implementation
-----------------------------------------------------------
1. Use an
independent storage allocator per document. The advantage here is
that allocation would require no synchronization
=> Fast, good
scalability, reduced memory footprint
2. Use plain, null-terminated
(XMLCh *) utf-16 strings.
=> No DOMString
class overhead which is another performance contributor
that makes IDOM faster
4.2.2 Downside of IDOM
Implementation
-------------------------------------------------------------
1. Manual memory management
- If document comes from
parser, then parser owns the document. If document comes
from DOMImplementation, then users are responsible to delete
it.
Solution:
-
Provide a means of disassociating a document from the
parser
- Add
a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of the
Node.
- From C++ Binding abstract interface
perspective, it's up to implementation how to handle this
"release()" function.
- With Xerces-C++ IDOM implementation, the
release() function will delete the 'this' pointer if it is a
document, else no-op.
2. Memory retained until the document
is deleted.
- If you change the value
of an attribute or call removeNode many times, the
memory of the old value is not deallocated for reuse and the
document grows and grows
Solution:
- This
in fact is a tradeoff for the fast performance
offered by independent storage allocator.
-
There is no
immediate good solution in place
5.0 old DOM
==========
5.1 Interface
==========
5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element
DOM_Document::createElement(const DOMString tagName);
1. Use smart pointers -
Java-like
5.1.2 Pros and Cons of old DOM
Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable name
2. (Depends on users'
preference)
- someone wants
Java-like interface for ease to use, ease to learn and ease to
port (from Java).
3. Not that many users have migrated to IDOM yet,
so migration impact is minimal.
Cons:
1. Not abstract base class
- Cannot be recommended as Apache DOM
C++ Binding
- Implementation (smart pointer
indirection) is assumed
Solution:
- This in fact is a tradeoff for the ease of use of
smart pointer design
- No
solution.
2. (Depends on users'
preference)
- someone wants C++-like
as this is C++ interface
5.2 Implementation
===============
5.2.1 Features of old DOM
Implementation
----------------------------------------------------------------
1. Automatic memory
management
- Memory
is released when there is no more handles pointing to
it
- Use reference count to
keep track of handles
2. Use thread-safe DOMString
class
5.2.2 Downside of old DOM
Implementation
--------------------------------------------------------------------
1. Performance is slow
- Memory management is
the biggest time consumer, and a lot of memory
footprint.
- There are a whole lot
of blocks allocated when creating a document and then freed when
finished with it. Each and every node requires at least one and
sometimes several separately allocated blocks. DOMString take
three. It adds up.
Solution:
-
Lenny suggests to use IDOM interface internally in
DOM implementation, patch in
Bugzilla 5967
- Then
the performance benefits of IDOM is gained but the memory retained problem
in IDOM implementation still remains to address.
- And
internally, we will have dual interface maintenance
model as IDOM interface is then used by DOM
internally.
Vote Question:
============
I would like to call for a
vote:
==> Which
INTERFACE should be the Xerces-C++ public supported W3C DOM
Interface, DOM or IDOM? <===
Note:
1. The question is asking which "interface" to
be officially supported. Once the choice of interface
is chosen, we can discuss how to solve the downside of
implementation as the next topic.
2. The one being voted will become the ONLY
Xerces-C++ supported public W3C DOM Interface, and is where the
DOM Level 3 being implemented.
3. The API of the other interface will
be deprecated. And its samples, and associated
Parser will eventually be removed from the distribution
|