|
I just had a
new thought; if having a DOMString class is desired, for functionality and/or
DOM compliance, then the smart pointer approach can still be used by updating
the IDOM classes to return DOMString instances instead of XMLCh*. With
using smart pointers we would still only have one set of
interfaces to maintain, and performance would be negligibly affected as I
pointed out earlier that I modified DOMString to simply wrap an alias to the
node owned XMLCh* data, and only makes a copy if
modified.
Lenny
Hi
Samar,
You make
good points.
I would
agree that it is reasonable to nix the DOMString, but does anyone object to
that given that DOMString is explicitly specified in the W3C DOM
specification? Judging so far from the early responders to the vote, no,
as folks voting for the IDOM interface are also voting to nix the DOMString
class.
(Tinny), do you
anticipate the W3C to complain if the C++ binding does not have
a DOMString? In other words, will we be able to call ourselves DOMx
compliant without it?
One more
consequence of using the smart pointer approach is that backwards
compatibility with the original DOM interfaces is sacrificed for backwards
compatibility with the IDOM interfaces. I thought that with the original
DOM interfaces being officially supported and around longer that backwards
compatibility to it would be more important, but so far I no one using the
original DOM interface has spoken up. For my use cases it
simply doesn't matter, what matters most to me is functional behavior and ease
of use.
Just to
make it easier to review, here is the earlier example following your
suggestion to avoid using an int operator on node for null
comparison:
if (!pm_Element.isNull())
pm_Element->getAttribute(...);
Lenny
If the
desire is to maintain only one interface, then I would be of the opinion
that we should nix the DOMString class and use a 'smart pointer' class to
wrapper the internal interfaces. In many cases, people will likely have
their own preferred string class which they use and will immediately convert
the value extracted from the DOM before passing into any other layer of
their code.
If we
keep DOMString around, I would recommend against having a (const XMLCh *)
operator as this can result in some incredibly hard to track errors. Most
C++ style guides recommend against implicit conversion operators. Note the
lack of such an operator in the C++ standard library string, i.e.
std::basic_string<T>. Having something like rawBuffer, or XMLCh()
would be clearer and lets one control lifetimes in a much clearer way
(IMHO).
Also, I
would recommend against adding an int operator on the smart pointer class.
It is not that much work to call isNull on the object, and is much clearer
from a readability perspective as well as helps catch silly errors at
compile time. If we must have such an operator then it may be better to
add a bool operator instead of int, as this will likely reduce the number of
places where the implicit conversion operator will be
called.
My two
bits...
Samar
Lotia
Hi
Markus,
Thank
you very much for the insight.
Note
that simply accessing the IDOM implementation via handles does not affect
its thread safety-ness, thus your application is safe.
if (pm_Element)
pm_Element->getAttribute(...);
How can I do
this with references?
You do it with the current handles like
this:
if (!pm_Element.isNull())
pm_Element.getAttribute(...);
Adding
an int operator to DOM_Node would allow even more friendly syntax;
e.g.
if (pm_Element)
pm_Element.getAttribute(...);
This
could be easily added.
In
fact, an -> operators could be added to the DOM_Node classes and get
this:
if (pm_Element)
pm_Element->getAttribute(...);
This is
now exactly what you started out with, thus is completely backward
compatible with your current use of the IDOM.
XMLCh* are
easier to handle as DOMString-Objects in ATL : CComBSTR cBstr =
pm_Element->getAttribute(...);
Good
point, the current DOMString class does not have an XMLCh* operator, which
if it did would solve your problem. I pretty much gutted
the original DOMString class to make it a simple wrapper around an XMLCh*
returned from IDOM implementations, in lieu of suffering the costs
of a the cross document string management of the original DOM.
As far as I can tell the only reason the original DOMString did not have
an XMLCh* operator was because there was no guarantee that its internal
XMLCh* was null terminated; well, that guarantee does now exist and the
operator can be added -- I will do that. So your example
remains:
CComBSTR cBstr =
pm_Element->getAttribute(...);
Note
that string classes are convenient way to perform various operations
on a string without using the static (read functional) methods provided by
XMLString. I even implemented COW (copy on write) behavior in
the new DOMString class, so that you can feel free to modify a string
returned from a node without having to manually make a
copy.
If
folks don't find the DOMString wrapper to be that important, that frees me
up to simplify the handle classes and address one of Tinny's
concerns. Tinny pointed out that while the new design hides dual
interfaces (DOM and IDOM) from users, it does not hide them from DOM
developers; as DOM 3 support is added, each interface change would
have to be made to both DOM and IDOM classes. The only reason I went
with complete interface replication instead of simple smart pointers
for the handle classes was to be able to translate XMLCh pointers returned
from IDOM nodes into DOMStrings. If I am allowed to get rid of
DOMString altogether I can make the handle classes simple smart pointers
that do not replicate IDOM interfaces, and thus the duplication of effort
is eliminated.
Lenny
-----Original
Message----- From: Markus Fellner
[mailto:[EMAIL PROTECTED]] Sent: Monday, April 29, 2002 6:17
PM To: [EMAIL PROTECTED];
[EMAIL PROTECTED] Subject: AW: Call for Vote: which
one to be the Xerces-C++ public supported W3C DOM
interface
O.k
the main reaseon for my IDOM flirtation is...
I've
chosen IDOM cause of its thread-safeness. And
now I have several thousands lines of code using
IDOM interface.
Some
other reasons are...
I
have many IDOM_Element* members (pm_Elem) in my classes.
After parsing they will be assigned one time and than many times
checked if they are really assigned and used for reading and writing
attributes.
if
(pm_Element)
pm_Element->getAttribute(...);
How can I do this with references?
XMLCh* are easier to handle as
DOMString-Objects in ATL : CComBSTR cBstr =
pm_Element->getAttribute(...);
...
Sorry for my short answer. I go on holiday
tomorrow and i have to pack up!
I'm
back in 2 weeks and looking forward to see the results of this
voting.
It's
a pitty to go during a hot discussion on this
list.
Markus
Hi
Markus,
To
be clear, the fix I created for the IDOM was to recycle memory once a
node or string is no longer needed. To know when a node is
no longer needed I used the original DOM interface, but have them
wrapping up the IDOM as the implementation. IDOM performance is
maintained, but ease of use is greatly improved. Without using
the DOM handles to know when an IDOM node is in use or not,
application code will be drawn into explicitly stating when a node is
no longer needed and can be recycled, which is yet another thing to be
documented and to for application developers to get wrong and suffer
consequences for.
If
you love and use the IDOM for its performance, you want the memory
problem fixed so that it is really fixed, not a workaround that
only works if your application does everything right, then you
will love what I have done with combining DOM classes as handles, and
IDOM classes as bodies.
If
what you love is working with pointers instead of with objects, please
let me know why.
One
thing I have found harder with objects vs.. pointers is down casting
from node to derived objects like element. The syntax is a bit
cleaner with pointers; e.g.:
DOM_Node node =
...
DOM_Element elem =
(const DOM_Element&)node;
vs:
IDOM_Node* node =
..
IDOM_Element* elem =
(IDOM_Element*)node;
It
is easy to forget to add the const in the first case, and is somewhat
non-intuitive because slicing can happen, though it is not
problem in this case.
To
solve this problem I have thought of adding overloaded constructors
and assignment operators that take a DOM_Node to DOM_Node derived
classes like DOM_Element. Thus the first example
becomes:
DOM_Node node =
...
DOM_Element elem =
node;
Not
only is this code more succinct, but it is safer, as the overloaded
constructor and assignment operator can check for node compatibility
via the getNodeType call.
Again, please let me know what other aspects
of points make things easier for you.
> Hope your fix has no effects on
thread-safe-ness!
No
affect whatsoever.
Lenny
Hi Lenny,
I
hope your fix of the IDOM memory problem goes into the next official
release. But I use and love the IDOM interface.
It's really easier for an old C++
programmer like me! And I use IDOM cause of its threadsafe
properties. Hope your fix has no effects on
thread-safe-ness!
Markus
Hi Markus,
The memory management problem solved by recycling no
longer used nodes and strings. The only clean way I know to
know when nodes and strings are being used is to use the
handle/body pattern, which is what is used by the original
DOM. What I have done is use the original DOM handles and
the IDOM implementation, but fixed the IDOM memory
problem.
Lenny
If the memory management problem is solved, I
prefer IDOM!!!
Hi everyone,
I've reviewed Andy's design
objective of IDOM, Lenny's view of old DOM and his proposal of
redesign, and some users feedback. Here is a
"quick" summary and I would like to call for a VOTE about the
fate of these two interfaces.
1.0 Objective
==========
1. Define the strategy of
Xerces-C++ public DOM interface. Decide which one to keep, old DOM interface or new
IDOM interface
2.0 Motivation
===========
1. As a long term strategy, Xerces-C++
shouldn't define two W3C DOM interfaces which simply confuses
users.
=> We've already
got many users' questions about what the difference, which one
to use ... etc.
2. With limited resource, we should
focus our development on ONE stream, no more duplicate
effort
=> New DOM Level 3
development should be done on one interface, not
both.
=> No more dual
maintenance: two set of samples (e.g. DOMPrint vs IDOMPrint),
two parsers (DOMParser vs IDOMParser)
=> To encourage
more users to develop DOM application AND implementation based
on this binding.
=> Such binding
should just define a set of abstract base classes (similar to
JAVA interface) where no implementation model is
assumed
3.0 History
=========
'DOM' was the initial "W3C DOM
interface" developed by Xerces-C++. However the
performance of its implementation is not quite
satisfactory.
Last year, Andy Heninger came up
with a new design with faster performance, and such
implementation came with a new set of interface
=> 'IDOM'.
Currently both 'DOM' and 'IDOM' are
shipped with Xerces-C++. 'IDOM' is claimed as
experimental (like a prototype) and is subject to
change.
More information can be found in
: http://xml.apache.org/xerces-c/program.html
4.0 IDOM
=========
4.1 Interface
==========
4.1.1 Features of IDOM
Interface
--------------------------------------------------
e.g. virtual IDOM_Element*
IDOM_Document::createElement(const XMLCh* tagName) =
0;
1. Define as abstract base classes
2. Use normal C++ pointers.
=> So that abstract base class is
possible.
=> Make it more
C++ like. Less Java like.
4.1.2 Pros and Cons of IDOM
Interface
----------------------------------------------------------
Pros:
1. Abstract base classes that correspond to the W3C DOM
interfaces
=> Can be
recommended as Apache DOM C++ Binding
=> More standard
like, no implementation assumed as they are just abstract
interfaces using pure virtual functions
2. (Depends on users'
preference)
- someone prefers C++
like style
Cons:
1. IDOM_XXX - weird prefix
'I'
Solution:
-
Proposed to rename to DOMXXXX which
also matches the DOM Level 3 naming convention
2. (Depends on users'
preference)
- someone does not
like pointers, and wants Java-like interface for ease to use,
ease to learn and ease to port (from Java).
3. As the old DOM interface has been
around for a long time, majority of current Xerces-C++ still
uses the old DOM interface, significant migration
impact
Solution:
-
Announce the deprecation of old DOM interface for a couple of
releases before removal
4.2 Implementation
===============
4.2.1
Features of IDOM Implementation
-----------------------------------------------------------
1. Use an
independent storage allocator per document. The advantage here
is that allocation would require no synchronization
=> Fast, good
scalability, reduced memory footprint
2. Use plain, null-terminated
(XMLCh *) utf-16 strings.
=> No DOMString
class overhead which is another performance
contributor that makes IDOM faster
4.2.2 Downside of IDOM
Implementation
-------------------------------------------------------------
1. Manual memory management
- If document comes
from parser, then parser owns the document. If document
comes from DOMImplementation, then users are responsible to
delete it.
Solution:
-
Provide a means of disassociating a document from the
parser
-
Add a function "Node::release()", similar to the idea of
"Range::detach", which allows users to indicate the release of
the Node.
- From C++ Binding abstract interface
perspective, it's up to implementation how to handle this
"release()" function.
- With Xerces-C++ IDOM implementation, the
release() function will delete the 'this' pointer if it
is a document, else no-op.
2. Memory retained until the
document is deleted.
- If you change the
value of an attribute or call removeNode many times,
the memory of the old value is not deallocated for reuse
and the document grows and grows
Solution:
-
This in fact is a tradeoff for the fast performance
offered by independent storage allocator.
-
There is no immediate good solution in
place
5.0 old DOM
==========
5.1 Interface
==========
5.1.1 Features of old DOM Interface
-----------------------------------------------------
e.g. DOM_Element
DOM_Document::createElement(const DOMString
tagName);
1. Use smart pointers -
Java-like
5.1.2 Pros and Cons of old DOM
Interface
--------------------------------------------------------------
Pros:
1. DOM_XXX - reasonable
name
2. (Depends on users'
preference)
- someone wants
Java-like interface for ease to use, ease to learn and ease to
port (from Java).
3. Not that many users have migrated to IDOM yet,
so migration impact is minimal.
Cons:
1. Not abstract base class
- Cannot be recommended as
Apache DOM C++ Binding
- Implementation (smart pointer
indirection) is assumed
Solution:
- This in fact is a tradeoff for the ease of use of
smart pointer design
-
No solution.
2. (Depends on users'
preference)
- someone wants
C++-like as this is C++ interface
5.2 Implementation
===============
5.2.1 Features of old DOM
Implementation
----------------------------------------------------------------
1. Automatic
memory management
-
Memory is released when there is no more handles pointing to
it
- Use reference count
to keep track of handles
2. Use thread-safe DOMString
class
5.2.2 Downside of old DOM
Implementation
--------------------------------------------------------------------
1. Performance is slow
- Memory management
is the biggest time consumer, and a lot of memory
footprint.
- There are a whole
lot of blocks allocated when creating a document and then
freed when finished with it. Each and every node requires at
least one and sometimes several separately allocated blocks.
DOMString take three. It adds up.
Solution:
-
Lenny suggests to use IDOM interface internally in
DOM implementation, patch in
Bugzilla 5967
-
Then the performance benefits of IDOM is
gained but the memory retained problem in IDOM
implementation still remains to address.
-
And internally, we will have dual
interface maintenance model as IDOM interface
is then used by DOM internally.
Vote Question:
============
I would like to call for a
vote:
==> Which
INTERFACE should be the Xerces-C++ public supported W3C
DOM Interface, DOM or IDOM? <===
Note:
1. The question is asking which "interface" to
be officially supported. Once the choice of
interface is chosen, we can discuss how to solve the downside
of implementation as the next topic.
2. The one being voted will become the
ONLY Xerces-C++ supported public W3C DOM Interface, and is
where the DOM Level 3 being implemented.
3. The API of the other
interface will be deprecated. And its samples,
and associated Parser will eventually be removed
from the distribution
|